Chapter 10 Testing
10.1 Introduction
Our testing framework includes several types of tests. Unit and functional tests are created for individual functions and modules, respectively, where unit tests test each of the smallest components and functional tests ensure that the application of small components function within the requirements. Integration tests verify that modules work well together. Run-time tests include checks that are integrated into the code to catch errors related to user input. Regression testing and platform compatibility testing are executed before pre-releasing FIMS to ensure that previously tested pieces are still performing after the code has been changed on all platforms of interest. Beta-testing is used to gather feedback from users (i.e., members of FIMS Implementation Team and other users) during the pre-release stage. Finally, one-off testing is used for replicating and fixing user-reported bugs. Some of these one-off tests may eventually be integrated into the permanent testing framework. Or, if while you are coding you find a useful interactive tests that helped you build something, then it should be converted into {testthat} or GoogleTest tests.
FIMS uses GoogleTest to build a C++ unit testing framework and {testthat} to build an R testing framework. FIMS uses Google Benchmark to measure the real time and CPU time used for running the produced binaries. All required software for testing can be installed using the instructions in the Software to install section.
10.2 C++ unit testing and benchmarking
Inside of your cloned version of FIMS, the file CMakeLists.txt
, in the top-level directory, instructs Cmake
on how to create the build files, including setting up Google Test. The Google Test code is in tests/gtest
. Within this subdirectory is another CMakeLists.txt
that contains additional specifications on how to register the individual tests.
10.2.1 Build and run the tests
The following commands on the command line (note that Windows users cannot use Git bash and must use a native Windows shell) execute the outlined steps and are needed to build the tests:
- Generate the build system using
Ninja
as the generator, which creates thebuild
directory. - Use
Cmake
to build in thebuild
directory in parallel using 16 jobs but--parallel 16
can be deleted. - Run the tests using
ctest
in parallel using 16 jobs but--parallel 16
can be deleted.
The output from running the tests should look something like the following:
Internal ctest changing into directory: C:/github_repos/NOAA-FIMS_org/FIMS/build
Test project C:/github_repos/NOAA-FIMS_org/FIMS/build
Start 1: dlognorm.use_double_inputs
1/5 Test #1: dlognorm.use_double_inputs ....... Passed 0.04 sec
Start 2: dlognorm.use_int_inputs
2/5 Test #2: dlognorm.use_int_inputs .......... Passed 0.04 sec
Start 3: modelTest.eta
3/5 Test #3: modelTest.eta .................... Passed 0.04 sec
Start 4: modelTest.nll
4/5 Test #4: modelTest.nll .................... Passed 0.04 sec
Start 5: modelTest.evaluate
5/5 Test #5: modelTest.evaluate ............... Passed 0.04 sec
100% tests passed, 0 tests failed out of 5
10.2.2 Adding a C++ test
Create a file dlognorm.hpp within the src
subfolder that contains a simple function:
#include <cmath>
template<class Type>
Type dlognorm(Type x, Type meanlog, Type sdlog){
Type resid = (log(x)-meanlog)/sdlog;
Type logres = -log(sqrt(2*M_PI)) - log(sdlog) - Type(0.5)*resid*resid - log(x);
return logres;
}
Then, create dlognorm-unit.cpp
in tests/gtest
that has a test suite for the dlognorm
function:
#include "gtest/gtest.h"
#include "../../src/dlognorm.hpp"
// # R code that generates true values for the test
// dlnorm(1.0, 0.0, 1.0, TRUE) = -0.9189385
// dlnorm(5.0, 10.0, 2.5, TRUE) = -9.07679
namespace {
// TestSuiteName: dlognormTest; TestName: DoubleInput and IntInput
// Test dlognorm with double input values
TEST(dlognormTest, DoubleInput) {
EXPECT_NEAR( dlognorm(1.0, 0.0, 1.0) , -0.9189385 , 0.0001 );
EXPECT_NEAR( dlognorm(5.0, 10.0, 2.5) , -9.07679 , 0.0001 );
}
// Test dlognorm with integer input values
TEST(dlognormTest, IntInput) {
EXPECT_NEAR( dlognorm(1, 0, 1) , -0.9189385 , 0.0001 );
}
}
EXPECT_NEAR(val1, val2, absolute_error)
verifies that the difference between val1
and val2
does not exceed the absolute error bound absolute_error
. EXPECT_NE(val1, val2)
verifies that val1
is not equal to val2
. See GoogleTest assertions reference for more EXPECT_
macros.
Finally, the test must be added to tests/gtest/CMakeLists.txt
before running the tests. Add the following contents to the end of tests/gtest/CMakeLists.txt
to enable testing in CMake, declare the C++ test
binary you want to build (dlognorm_test), and link it to GoogleTest (gtest_main). :
add_executable(dlognorm_test
dlognorm-unit.cpp
)
target_include_directories(dlognorm_test
PUBLIC
${CMAKE_SOURCE_DIR}/../
)
target_link_libraries(dlognorm_test
gtest_main
)
include(GoogleTest)
gtest_discover_tests(dlognorm_test)
Now you can build and run your test using the instructions in Build and run the tests. The output when running ctest
will look something like the following, note that there is a failing test:
Internal ctest changing into directory: C:/Users/Kathryn.Doering/Documents/testing/FIMS/build
Test project C:/Users/Kathryn.Doering/Documents/testing/FIMS/build
Start 1: dlognorm.use_double_inputs
1/7 Test #1: dlognorm.use_double_inputs ....... Passed 0.04 sec
Start 2: dlognorm.use_int_inputs
2/7 Test #2: dlognorm.use_int_inputs .......... Passed 0.04 sec
Start 3: modelTest.eta
3/7 Test #3: modelTest.eta .................... Passed 0.04 sec
Start 4: modelTest.nll
4/7 Test #4: modelTest.nll .................... Passed 0.04 sec
Start 5: modelTest.evaluate
5/7 Test #5: modelTest.evaluate ............... Passed 0.04 sec
Start 6: dlognormTest.DoubleInput
6/7 Test #6: dlognormTest.DoubleInput ......... Passed 0.04 sec
Start 7: dlognormTest.IntInput
7/7 Test #7: dlognormTest.IntInput ............***Failed 0.04 sec
86% tests passed, 1 tests failed out of 7
Total Test time (real) = 0.28 sec
The following tests FAILED:
7 - dlognormTest.IntInput (Failed)
Errors while running CTest
Output from these tests are in: C:/Users/Kathryn.Doering/Documents/testing/FIMS/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
10.2.3 Debugging a C++ test
There are two ways to debug a C++ test, interactively using gdb
or via print statements.
To debug C++ code (e.g., segmentation error/memory corruption) using gdb:
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Debug
cmake --build build --parallel 16
ctest --test-dir build --parallel 16
gdb ./build/tests/gtest/population_dynamics_population.exe
c // to continue without paging
run // to see which line of code is broken
print this->log_naa // for example, print this->log_naa to see the value of log_naa;
print i // for example, print i from the broken for loop
bt // backtrace
q // to quit
To debug C++ code with print statements you must update the C++ in the desired .hpp
file by adding std::ofstream out(“file_name.txt”)
and using out << variable;
to print out values of the variable. The output of the print statements will be in FIMS/build/tests/gtest/debug.txt
after you run the cmake
and ctest
calls the instructions in Build and run the tests.
More complex examples with text identifying the quantities
10.2.4 Benchmark example
Google Benchmark measures the real time and CPU time used for running the produced binary. We will continue using the dlognorm.hpp example. Create dlognorm_benchmark.cpp
in tests/gtest
with the following code to run the dlognorm
function and use BENCHMARK to see how long it takes.
#include "benchmark/benchmark.h"
#include "../../src/dlognorm.hpp"
void BM_dlgnorm(benchmark::State& state)
{
for (auto _ : state)
dlognorm(5.0, 10.0, 2.5);
}
BENCHMARK(BM_dlgnorm);
Next, the following needs to be added to the end of tests/gtest/CMakeLists.txt
:
FetchContent_Declare(
googlebenchmark
URL https://github.com/google/benchmark/archive/refs/tags/v1.6.0.zip
)
FetchContent_MakeAvailable(googlebenchmark)
add_executable(dlognorm_benchmark
dlognorm_benchmark.cpp
)
target_include_directories(dlognorm_benchmark
PUBLIC
${CMAKE_SOURCE_DIR}/../
)
target_link_libraries(dlognorm_benchmark
benchmark_main
)
To run the benchmark, run cmake sending output to the build subfolder and run the created executable:
The output from dlognorm_benchmark.exe
might look like this:
Run on (8 X 2112 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
BM_dlgnorm 153 ns 153 ns 4480000
10.2.5 Clean up after running C++ tests
After running the examples above, the build generates files (i.e., the source code, libraries, and executables) and saves the files in the build
subfolder. The example above demonstrates an “out-of-source” build which puts generated files in a completely separate directory, so that the source tree is unchanged after running tests. Using a separate source and build tree reduces the need to delete files that differ between builds. If you still would like to delete CMake-generated files, just delete the build
folder, and then build and run tests by repeating the commands below. The files from the build
folder are included in the FIMS repository’s .gitignore file, so should not be pushed to the FIMS repository.
For simple C++ functions like the examples above, we do not need to clean up the tests. Clean up is only necessary in a few situations.
- If memory for an object was allocated during testing and not deallocated then the object needs to be deleted (e.g.,
delete object
). - If you used a test fixture from GoogleTest to use the same data configuration for multiple tests,
TearDown()
can be used to clean up the test and then the test fixture will be deleted. See GoogleTest user’s guide for more details. - If you do not want to keep any of the files produced by the example and want to completely clear any uncommitted changes and files from the git repo, run
git restore .
, which removes any committed changes in files tracked by git orgit clean -fd
to get remove all untracked files in the repository.
10.3 Templates for GoogleTest testing
This section includes templates for creating unit tests and benchmarks. This is the code that would go into the .cpp files in tests/gtest.
10.3.1 C++ test templates
10.3.1.3 tests/gtest/CMakeLists.txt
template
These lines are added each time a new test, e.g., TestSuiteName1
, is added:
// Add test suite 1
add_executable(TestSuiteName1
test1.cpp
)
target_link_libraries(TestSuiteName1
gtest_main
)
gtest_discover_tests(TestSuiteName1)
These lines are added each time a new benchmark, e.g., benchmark1
, is added:
10.4 R testing
R tests are written using {testthat}, which can be installed as an R package. More details on {testthat} can be found in the testing chapter of R packages.
10.4.1 Testing FIMS locally
To test FIMS R functions, interactively and locally, use devtools::install()
rather than devtools::load_all()
because using load_all()
will turn on the debugger, bloating the .o file and may lead to a compilation error, among other problems (see Installing FIMS for more information).
10.4.2 Testing using gdbsource
You can interactively debug C++ code using TMB::gdbsource()
in RStudio.
10.4.3 and file organization
- Group functions and their helpers together, i.e., the “main function plus helpers” approach.
- {testthat} tests that are a test of Rcpp code should be called
test-rcpp-[description].R
. - Integration tests that do not have a corresponding
.R
file should use the follwing convention:test-integration-[description].R
.
10.4.4 R template
Naming conventions for {testthat} files follow the tidyverse test convention as well as the using test-rcpp-[description].R
for tests of Rcpp code and test-integration-[description].R
for integration tests without a corresponding R function.
The format for an individual testthat test is is:
10.4.5 Random numbers
Simulation results might be dependent on the order of calls, leading to failed tests just because different random numbers are used or the order of the simulation changes through model
development (see FIMS-planning discussion 25 for details). Below are some potential soluations.
- Add a TRUE/FALSE parameter in each FIMS simulation module for
setting up testing seed. When testing the module, set the parameter to
TRUE to fix the seed number in R and conduct tests. If adding a TRUE/FALSE parameter does not work as expected, then carefully check simulated data from each component and make sure it is not a model coding error.
- Use set.seed()
from R to set the seed and investigate using {rstream} to generate multiple streams of random numbers to associate distinct streams of random numbers with different sources of randomness. {rstream} was specifically designed to address the issue of needing very long streams of pseudo-random numbers for parallel computations. See the rstream paper and RngStreams for more details.