Chapter 10 Testing

10.1 Introduction

Our testing framework includes several types of tests. Unit and functional tests are created for individual functions and modules, respectively, where unit tests test each of the smallest components and functional tests ensure that the application of small components function within the requirements. Integration tests verify that modules work well together. Run-time tests include checks that are integrated into the code to catch errors related to user input. Regression testing and platform compatibility testing are executed before pre-releasing FIMS to ensure that previously tested pieces are still performing after the code has been changed on all platforms of interest. Beta-testing is used to gather feedback from users (i.e., members of FIMS Implementation Team and other users) during the pre-release stage. Finally, one-off testing is used for replicating and fixing user-reported bugs. Some of these one-off tests may eventually be integrated into the permanent testing framework. Or, if while you are coding you find a useful interactive tests that helped you build something, then it should be converted into {testthat} or GoogleTest tests.

FIMS uses GoogleTest to build a C++ unit testing framework and {testthat} to build an R testing framework. FIMS uses Google Benchmark to measure the real time and CPU time used for running the produced binaries. All required software for testing can be installed using the instructions in the Software to install section.

10.2 C++ unit testing and benchmarking

Inside of your cloned version of FIMS, the file CMakeLists.txt, in the top-level directory, instructs Cmake on how to create the build files, including setting up Google Test. The Google Test code is in tests/gtest. Within this subdirectory is another CMakeLists.txt that contains additional specifications on how to register the individual tests.

10.2.1 Build and run the tests

The following commands on the command line (note that Windows users cannot use Git bash and must use a native Windows shell) execute the outlined steps and are needed to build the tests:

Generate the build system using Ninja as the generator, which creates the build directory.
Use Cmake to build in the build directory in parallel using 16 jobs but --parallel 16 can be deleted.
Run the tests using ctest in parallel using 16 jobs but --parallel 16 can be deleted.

cmake -S . -B build -G Ninja 
cmake --build build --parallel 16
ctest --test-dir build --parallel 16

The output from running the tests should look something like the following:

Internal ctest changing into directory: C:/github_repos/NOAA-FIMS_org/FIMS/build
Test project C:/github_repos/NOAA-FIMS_org/FIMS/build
    Start 1: dlognorm.use_double_inputs
1/5 Test #1: dlognorm.use_double_inputs .......   Passed    0.04 sec
    Start 2: dlognorm.use_int_inputs
2/5 Test #2: dlognorm.use_int_inputs ..........   Passed    0.04 sec
    Start 3: modelTest.eta
3/5 Test #3: modelTest.eta ....................   Passed    0.04 sec
    Start 4: modelTest.nll
4/5 Test #4: modelTest.nll ....................   Passed    0.04 sec
    Start 5: modelTest.evaluate
5/5 Test #5: modelTest.evaluate ...............   Passed    0.04 sec

100% tests passed, 0 tests failed out of 5

10.2.2 Adding a C++ test

Create a file dlognorm.hpp within the src subfolder that contains a simple function:

#include <cmath>

template<class Type>
Type dlognorm(Type x, Type meanlog, Type sdlog){
  Type resid = (log(x)-meanlog)/sdlog;
  Type logres = -log(sqrt(2*M_PI)) - log(sdlog) - Type(0.5)*resid*resid - log(x);
  return logres;
}

Then, create dlognorm-unit.cpp in tests/gtest that has a test suite for the dlognorm function:

#include "gtest/gtest.h"
#include "../../src/dlognorm.hpp"

// # R code that generates true values for the test
// dlnorm(1.0, 0.0, 1.0, TRUE) = -0.9189385
// dlnorm(5.0, 10.0, 2.5, TRUE) = -9.07679

namespace {

  // TestSuiteName: dlognormTest; TestName: DoubleInput and IntInput
  // Test dlognorm with double input values

  TEST(dlognormTest, DoubleInput) {

    EXPECT_NEAR( dlognorm(1.0, 0.0, 1.0) , -0.9189385 , 0.0001 );
    EXPECT_NEAR( dlognorm(5.0, 10.0, 2.5) , -9.07679 , 0.0001 );

  }

  // Test dlognorm with integer input values

  TEST(dlognormTest, IntInput) {

    EXPECT_NEAR( dlognorm(1, 0, 1) , -0.9189385 , 0.0001 );

  }

}

EXPECT_NEAR(val1, val2, absolute_error) verifies that the difference between val1 and val2 does not exceed the absolute error bound absolute_error. EXPECT_NE(val1, val2) verifies that val1 is not equal to val2. See GoogleTest assertions reference for more EXPECT_ macros.

Finally, the test must be added to tests/gtest/CMakeLists.txt before running the tests. Add the following contents to the end of tests/gtest/CMakeLists.txt to enable testing in CMake, declare the C++ test binary you want to build (dlognorm_test), and link it to GoogleTest (gtest_main). :


add_executable(dlognorm_test
  dlognorm-unit.cpp
)

target_include_directories(dlognorm_test
  PUBLIC
    ${CMAKE_SOURCE_DIR}/../
)

target_link_libraries(dlognorm_test
  gtest_main
)

include(GoogleTest)
gtest_discover_tests(dlognorm_test)

Now you can build and run your test using the instructions in Build and run the tests. The output when running ctest will look something like the following, note that there is a failing test:

Internal ctest changing into directory: C:/Users/Kathryn.Doering/Documents/testing/FIMS/build
Test project C:/Users/Kathryn.Doering/Documents/testing/FIMS/build
    Start 1: dlognorm.use_double_inputs
1/7 Test #1: dlognorm.use_double_inputs .......   Passed    0.04 sec
    Start 2: dlognorm.use_int_inputs
2/7 Test #2: dlognorm.use_int_inputs ..........   Passed    0.04 sec
    Start 3: modelTest.eta
3/7 Test #3: modelTest.eta ....................   Passed    0.04 sec
    Start 4: modelTest.nll
4/7 Test #4: modelTest.nll ....................   Passed    0.04 sec
    Start 5: modelTest.evaluate
5/7 Test #5: modelTest.evaluate ...............   Passed    0.04 sec
    Start 6: dlognormTest.DoubleInput
6/7 Test #6: dlognormTest.DoubleInput .........   Passed    0.04 sec
    Start 7: dlognormTest.IntInput
7/7 Test #7: dlognormTest.IntInput ............***Failed    0.04 sec

86% tests passed, 1 tests failed out of 7

Total Test time (real) =   0.28 sec

The following tests FAILED:
          7 - dlognormTest.IntInput (Failed)
Errors while running CTest
Output from these tests are in: C:/Users/Kathryn.Doering/Documents/testing/FIMS/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

10.2.3 Debugging a C++ test

There are two ways to debug a C++ test, interactively using gdb or via print statements.

To debug C++ code (e.g., segmentation error/memory corruption) using gdb:

cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Debug
cmake --build build --parallel 16
ctest --test-dir build --parallel 16
gdb ./build/tests/gtest/population_dynamics_population.exe
c // to continue without paging
run // to see which line of code is broken
print this->log_naa // for example, print this->log_naa to see the value of log_naa; 
print i // for example, print i from the broken for loop
bt // backtrace
q // to quit

To debug C++ code with print statements you must update the C++ in the desired .hpp file by adding std::ofstream out(“file_name.txt”) and using out << variable; to print out values of the variable. The output of the print statements will be in FIMS/build/tests/gtest/debug.txt after you run the cmake and ctest calls the instructions in Build and run the tests.

nfleets = fleets.size();
std::ofstream out("debug.txt");
out <<nfleets;

More complex examples with text identifying the quantities

out <<" fleet_index: "<<fleet_index<<" index_yaf: "<<index_yaf<<" index_yf: "<<index_yf<<"\n";
out <<" population.Fmort[index_yf]: "<<population.Fmort[index_yf]<<"\n";

10.2.4 Benchmark example

Google Benchmark measures the real time and CPU time used for running the produced binary. We will continue using the dlognorm.hpp example. Create dlognorm_benchmark.cpp in tests/gtest with the following code to run the dlognorm function and use BENCHMARK to see how long it takes.

#include "benchmark/benchmark.h"
#include "../../src/dlognorm.hpp"

void BM_dlgnorm(benchmark::State& state)
{
  for (auto _ : state)
    dlognorm(5.0, 10.0, 2.5);
}
BENCHMARK(BM_dlgnorm);

Next, the following needs to be added to the end of tests/gtest/CMakeLists.txt:

FetchContent_Declare(
  googlebenchmark
  URL https://github.com/google/benchmark/archive/refs/tags/v1.6.0.zip
)
FetchContent_MakeAvailable(googlebenchmark)

add_executable(dlognorm_benchmark
  dlognorm_benchmark.cpp
)

target_include_directories(dlognorm_benchmark
  PUBLIC
    ${CMAKE_SOURCE_DIR}/../
)

target_link_libraries(dlognorm_benchmark
  benchmark_main
)

To run the benchmark, run cmake sending output to the build subfolder and run the created executable:

cmake --build build
build/tests/gtest/dlognorm_benchmark.exe

The output from dlognorm_benchmark.exe might look like this:

Run on (8 X 2112 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------
  Benchmark           Time             CPU   Iterations
-----------------------------------------------------
  BM_dlgnorm        153 ns          153 ns      4480000

10.2.5 Clean up after running C++ tests

After running the examples above, the build generates files (i.e., the source code, libraries, and executables) and saves the files in the build subfolder. The example above demonstrates an “out-of-source” build which puts generated files in a completely separate directory, so that the source tree is unchanged after running tests. Using a separate source and build tree reduces the need to delete files that differ between builds. If you still would like to delete CMake-generated files, just delete the build folder, and then build and run tests by repeating the commands below. The files from the build folder are included in the FIMS repository’s .gitignore file, so should not be pushed to the FIMS repository.

For simple C++ functions like the examples above, we do not need to clean up the tests. Clean up is only necessary in a few situations.

If memory for an object was allocated during testing and not deallocated then the object needs to be deleted (e.g., delete object).
If you used a test fixture from GoogleTest to use the same data configuration for multiple tests, TearDown() can be used to clean up the test and then the test fixture will be deleted. See GoogleTest user’s guide for more details.
If you do not want to keep any of the files produced by the example and want to completely clear any uncommitted changes and files from the git repo, run git restore ., which removes any committed changes in files tracked by git or git clean -fd to get remove all untracked files in the repository.

10.3 Templates for GoogleTest testing

This section includes templates for creating unit tests and benchmarks. This is the code that would go into the .cpp files in tests/gtest.

10.3.1 C++ test templates

10.3.1.1 Unit test template

#include "gtest/gtest.h"
#include "../../src/code.hpp"

// # R code that generates true values for the test

namespace {

  // Description of Test 1
  TEST(TestSuiteName, Test1Name) {

    ... test body ...

  }

  // Description of Test 2
  TEST(TestSuiteName, Test2Name) {

    ... test body ...

  }

}

10.3.1.2 Benchmark template

#include "benchmark/benchmark.h"
#include "../../src/code.hpp"

void BM_FunctionName(benchmark::State& state)
{
  for (auto _ : state)
    // This code gets timed
    Function()
}

// Register the function as a benchmark
BENCHMARK(BM_FunctionName);

10.3.1.3 `tests/gtest/CMakeLists.txt` template

These lines are added each time a new test, e.g., TestSuiteName1, is added:

// Add test suite 1
add_executable(TestSuiteName1
  test1.cpp
)

target_link_libraries(TestSuiteName1
  gtest_main
)

gtest_discover_tests(TestSuiteName1)

These lines are added each time a new benchmark, e.g., benchmark1, is added:

// Add benchmark 1
add_executable(benchmark1
  benchmark1.cpp
)

target_link_libraries(benchmark1
  benchmark_main
)

10.4 R testing

R tests are written using {testthat}, which can be installed as an R package. More details on {testthat} can be found in the testing chapter of R packages.

10.4.1 Testing FIMS locally

To test FIMS R functions, interactively and locally, use devtools::install() rather than devtools::load_all() because using load_all() will turn on the debugger, bloating the .o file and may lead to a compilation error, among other problems (see Installing FIMS for more information).

10.4.2 Testing using `gdbsource`

You can interactively debug C++ code using TMB::gdbsource() in RStudio.

10.4.3 and file organization

Group functions and their helpers together, i.e., the “main function plus helpers” approach.
{testthat} tests that are a test of Rcpp code should be called test-rcpp-[description].R.
Integration tests that do not have a corresponding .R file should use the follwing convention: test-integration-[description].R.

10.4.4 R template

Naming conventions for {testthat} files follow the tidyverse test convention as well as the using test-rcpp-[description].R for tests of Rcpp code and test-integration-[description].R for integration tests without a corresponding R function.

The format for an individual testthat test is is:

test_that("TestName", {

  ...test body...

})

10.4.5 Random numbers

Simulation results might be dependent on the order of calls, leading to failed tests just because different random numbers are used or the order of the simulation changes through model development (see FIMS-planning discussion 25 for details). Below are some potential soluations. - Add a TRUE/FALSE parameter in each FIMS simulation module for setting up testing seed. When testing the module, set the parameter to TRUE to fix the seed number in R and conduct tests. If adding a TRUE/FALSE parameter does not work as expected, then carefully check simulated data from each component and make sure it is not a model coding error. - Use set.seed() from R to set the seed and investigate using {rstream} to generate multiple streams of random numbers to associate distinct streams of random numbers with different sources of randomness. {rstream} was specifically designed to address the issue of needing very long streams of pseudo-random numbers for parallel computations. See the rstream paper and RngStreams for more details.