Chapter 10 Testing

This section describes testing for FIMS. FIMS uses Google Test for C++ unit testing and testthat for R unit testing.

10.1 Introduction

FIMS testing framework will include different types of testing to make sure that changes to FIMS code are working as expected. The unit and functional tests will be developed during the initial development stage when writing individual functions or modules. After completing development of multiple modules, integration testing will be developed to verify that different modules work well together. Checks will be added in the software to catch user input errors when conducting run-time testing. Regression testing and platform compatibility testing will be executed before pre-releasing FIMS. Beta-testing will be used to gather feedback from users (i.e., members of FIMS implementation team and other users) during the pre-release stage. After releasing the first version of FIMS, the development team will go back to the beginning of the testing cycle and write unit tests when a new feature needs to be implemented. One-off testing will be used for testing new features and fixing user-reported bugs when maintaining FIMS. More details of each type of test can be found in the Glossary section.

FIMS will use GoogleTest to build a C++ unit testing framework and R testthat to build an R testing framework. FIMS will use Google Benchmark to measure the real time and CPU time used for running the produced binaries.

10.2 C++ unit testing and benchmarking

10.2.1 Requirements

To use GoogleTest, you will need:

  • A compatible operating system (e.g. Windows, masOS, or Linux).

  • A C++ compiler that supports at least C++ 11 standard or newer (e.g. gcc 5.0+, clang 5.0+, or MSVC 2015+). For macOS users, Xcode 9.3+ provides clang 5.0. For R users, rtools4 includes gcc.

  • A build system for building the testing project. CMake and a compatible build tool such as Ninja are approved by NMFS HQ.

10.2.2 Setup for Windows users

  1. Download CMake 3.22.1 (cmake-3.22.1-windows-x86_64.zip) and put the file folder to Documents\Apps or other preferred folder.

  2. Download ninja v1.10.2 (ninja-win.zip) and put the application to Documents\Apps or other preferred folder.

  3. Open your Command Prompt and type cmake. If you see details of usage, cmake is already in your PATH. If not, follow the the instructions below to add cmake to your PATH.

  4. In the same command prompt, type ninja. If you see a message that starts with ninja:, even if it is an error about not finding build.ninja, this means that ninja is already in your PATH. If ninja is not found, follow the instructions below to add ninja to your path.

10.2.3 Adding cmake and ninja to your PATH on Windows

  1. In the Windows search bar next to the start menu, search for Edit environment variables for your account and open the Environment Variables window.

  2. Click Edit... under the User variables for firstname.lastname section.

  3. Click New, add path to cmake, if needed (e.g., cmake-3.22.1-windows-x86_64\bin or C:\Program Files\CMake\bin are common paths), and click OK. Click New, add path to the location of the Ninja executable, if needed (e.g., Documents\Apps\ninja-win or C:\Program Files\ninja-win), and click OK.

  4. You may need to restart your computer to update the envirionmental variables. You can check that the path is working by running where cmake or where ninja in a command terminal.

  5. Note that in certain Fisheries centers, NOAA employees do not have administrative privileges enabled to edit the local environmental path. In this situation it is necessary to create a ticket with IT to add cmake and ninja to your PATH on Windows.

10.2.4 Setup for Linux and Mac users

  1. See CMake installation instructions for installing CMake on other platforms. Add cmake to your PATH. You can check that the path is working by running which cmake in a command window.

  2. Download ninja v1.10.2 (ninja-win.zip) and put the binary in your preferred location. Add Ninja to your PATH. You can check that the path is working by running which ninja in a command window.

  3. Open a command window and type cmake. If you see usage, cmake is found. If not, cmake may still need to be added to your PATH.

  4. Open a command window and type ninja. If you see a message starting with ninja:, ninja is found. Otherwise, try changing the permissions or adding to your path.

10.2.5 How to edit your PATH and change file permissions for Linux and Mac

To check if the binary is in your path, assuming the binary is named ninja: open a Terminal window and type which ninja and hit enter. If you get nothing returned, then ninja is not in your path. The easiest way to fix this is to move the ninja binary to a folder that’s already in your path. To find existing path folders type echo $PATH in the terminal and hit enter. Now move the ninja binary to one of these folders. For example, in a Terminal window type:

sudo cp ~/Downloads/ninja /usr/bin/

To move ninja from the downloads folder to /usr/bin. You will need to use sudo and enter your password after to have permission to move a file to a folder like /usr/bin/.

Also note that you may need to add executable permissions to the ninja binary after downloading it. You can do that by switching to the folder where you placed the binary (cd /usr/bin/ if you followed the instructions above), and running the command:

sudo chmod +x ninja

Check that ninja is now executable and in your path:

which ninja

If you followed the instructions above, you will see the following line returned:

/usr/bin/ninja

10.2.6 Set up FIMS testing project

Clone the FIMS repository on the command line using:

git clone https://github.com/NOAA-FIMS/FIMS.git
cd FIMS

There is a file called CMakeLists.txt in the top level of the directory. This file instructs Cmake on how to create the build files, including setting up Google Test.

The Google Test testing code is in the tests/gtest subdirectory. Within this subdirectory is a file called CMakeLists.txt. This file contains additional specifications for CMake, in particular instructions on how to register the individual tests.

10.2.7 Build and run the tests

Three commands on the command line are needed to build the tests:

cmake -S . -B build -G Ninja

This generates the build system using Ninja as the generator. Note there is now a subfolder called build.

Next, in the same command window, use cmake to build in the build subfolder:

cmake --build build

Finally, run the C++ tests:

ctest --test-dir build

The output from running the tests should look something like:

Internal ctest changing into directory: C:/github_repos/NOAA-FIMS_org/FIMS/build
Test project C:/github_repos/NOAA-FIMS_org/FIMS/build
    Start 1: dlognorm.use_double_inputs
1/5 Test #1: dlognorm.use_double_inputs .......   Passed    0.04 sec
    Start 2: dlognorm.use_int_inputs
2/5 Test #2: dlognorm.use_int_inputs ..........   Passed    0.04 sec
    Start 3: modelTest.eta
3/5 Test #3: modelTest.eta ....................   Passed    0.04 sec
    Start 4: modelTest.nll
4/5 Test #4: modelTest.nll ....................   Passed    0.04 sec
    Start 5: modelTest.evaluate
5/5 Test #5: modelTest.evaluate ...............   Passed    0.04 sec

100% tests passed, 0 tests failed out of 5

10.2.8 Adding a C++ test

Create a file dlognorm.hpp within the src subfolder that contains a simple function:

#include <cmath>

template<class Type>
Type dlognorm(Type x, Type meanlog, Type sdlog){
  Type resid = (log(x)-meanlog)/sdlog;
  Type logres = -log(sqrt(2*M_PI)) - log(sdlog) - Type(0.5)*resid*resid - log(x);
  return logres;
}

Then, create a test file dlognorm-unit.cpp in the tests/gtest subfolder that has a test suite for the dlognorm function:

#include "gtest/gtest.h"
#include "../../src/dlognorm.hpp"

// # R code that generates true values for the test
// dlnorm(1.0, 0.0, 1.0, TRUE) = -0.9189385
// dlnorm(5.0, 10.0, 2.5, TRUE) = -9.07679

namespace {

  // TestSuiteName: dlognormTest; TestName: DoubleInput and IntInput
  // Test dlognorm with double input values

  TEST(dlognormTest, DoubleInput) {

    EXPECT_NEAR( dlognorm(1.0, 0.0, 1.0) , -0.9189385 , 0.0001 );
    EXPECT_NEAR( dlognorm(5.0, 10.0, 2.5) , -9.07679 , 0.0001 );

  }

  // Test dlognorm with integer input values

  TEST(dlognormTest, IntInput) {

    EXPECT_NEAR( dlognorm(1, 0, 1) , -0.9189385 , 0.0001 );

  }

}

EXPECT_NEAR(val1, val2, absolute_error) verifies that the difference between val1 and val2 does not exceed the absolute error bound absolute_error. EXPECT_NE(val1, val2) verifies that val1 is not equal to val2. Please see GoogleTest assertions reference for more EXPECT_ macros.

10.2.9 Add tests to tests/gtest/CMakeLists.txt and run a binary

To build the code, add the following contents to the end of the tests/gtest/CMakeLists.txt file:


add_executable(dlognorm_test
  dlognorm-unit.cpp
)

target_include_directories(dlognorm_test
  PUBLIC
    ${CMAKE_SOURCE_DIR}/../
)

target_link_libraries(dlognorm_test
  gtest_main
)

include(GoogleTest)
gtest_discover_tests(dlognorm_test)

The above configuration enables testing in CMake, declares the C++ test binary you want to build (dlognorm_test), and links it to GoogleTest (gtest_main). Now you can build and run your test. Open a command window in the FIMS repo (if not already opened) and type:

cmake -S . -B build -G Ninja

This generates the build system using Ninja as the generator.

Next, in the same command window, use cmake to build:

cmake --build build

Finally, run the tests in the same command window:

ctest --test-dir build

The output when running ctest might look like this. Note there is a failing test:

Internal ctest changing into directory: C:/Users/Kathryn.Doering/Documents/testing/FIMS/build
Test project C:/Users/Kathryn.Doering/Documents/testing/FIMS/build
    Start 1: dlognorm.use_double_inputs
1/7 Test #1: dlognorm.use_double_inputs .......   Passed    0.04 sec
    Start 2: dlognorm.use_int_inputs
2/7 Test #2: dlognorm.use_int_inputs ..........   Passed    0.04 sec
    Start 3: modelTest.eta
3/7 Test #3: modelTest.eta ....................   Passed    0.04 sec
    Start 4: modelTest.nll
4/7 Test #4: modelTest.nll ....................   Passed    0.04 sec
    Start 5: modelTest.evaluate
5/7 Test #5: modelTest.evaluate ...............   Passed    0.04 sec
    Start 6: dlognormTest.DoubleInput
6/7 Test #6: dlognormTest.DoubleInput .........   Passed    0.04 sec
    Start 7: dlognormTest.IntInput
7/7 Test #7: dlognormTest.IntInput ............***Failed    0.04 sec

86% tests passed, 1 tests failed out of 7

Total Test time (real) =   0.28 sec

The following tests FAILED:
          7 - dlognormTest.IntInput (Failed)
Errors while running CTest
Output from these tests are in: C:/Users/Kathryn.Doering/Documents/testing/FIMS/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

10.2.10 Debugging a C++ test

There are two ways to debug a C++ test, interactively using gdb or via print statements. To use gdb, make sure it is installed and on your path.

Debug C++ code (e.g., segmentation error/memory corruption) using gdb:

cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Debug
cmake --build build --parallel 16
ctest --test-dir build --parallel 16
gdb ./build/tests/gtest/population_dynamics_population.exe
c // to continue without paging
run // to see which line of code is broken
print this->log_naa // for example, print this->log_naa to see the value of log_naa; 
print i // for example, print i from the broken for loop
bt // backtrace
q // to quit

Debug C++ code without using gdb: Update code in a .hpp file by calling std::ofstream out(“file_name.txt”) Then use out << variable; to print out values of the variable

nfleets = fleets.size();
std::ofstream out("debug.txt");
out <<nfleets;

More complex examples with text identifying the quantities

out <<" fleet_index: "<<fleet_index<<" index_yaf: "<<index_yaf<<" index_yf: "<<index_yf<<"\n";
out <<" population.Fmort[index_yf]: "<<population.Fmort[index_yf]<<"\n";

Git Bash

cmake -S . -B build -G Ninja 
cmake --build build --parallel 16
ctest --test-dir build --parallel 16

The output of the print statements will be in this test file: FIMS/build/tests/gtest/debug.txt

10.2.11 Benchmark example

Google Benchmark measures the real time and CPU time used for running the produced binary. We will continue using the dlognorm.hpp example. Create a benchmark file dlognorm_benchmark.cpp and put it in the tests/gtest subfolder:

#include "benchmark/benchmark.h"
#include "../../src/dlognorm.hpp"

void BM_dlgnorm(benchmark::State& state)
{
  for (auto _ : state)
    dlognorm(5.0, 10.0, 2.5);
}
BENCHMARK(BM_dlgnorm);

This file runs the dlognorm function and uses BENCHMARK to see how long it takes.

A more comprehensive feature overview of benchmarking is available in the Google Benchmark GitHub repository.

10.2.12 Add benchmarks to tests/gtest/CMakeLists.txt and run the benchmark

To build the code, add the following contents to the end of your tests/gtest/CMakeLists.txt file:


FetchContent_Declare(
  googlebenchmark
  URL https://github.com/google/benchmark/archive/refs/tags/v1.6.0.zip
)
FetchContent_MakeAvailable(googlebenchmark)

add_executable(dlognorm_benchmark
  dlognorm_benchmark.cpp
)

target_include_directories(dlognorm_benchmark
  PUBLIC
    ${CMAKE_SOURCE_DIR}/../
)

target_link_libraries(dlognorm_benchmark
  benchmark_main
)

To run the benchmark, open the command line open in the FIMS repo (if not already open) and run cmake, sending output to the build subfolder:

cmake --build build

Then run the dlognorm_benchmark executable created:

build/tests/gtest/dlognorm_benchmark.exe

The output from dlognorm_benchmark.exe might look like this:

Run on (8 X 2112 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------
  Benchmark           Time             CPU   Iterations
-----------------------------------------------------
  BM_dlgnorm        153 ns          153 ns      4480000

10.2.12.1 Remove files produced by this example

If you don’t want to keep any of the files produced by this example and want to completely clear any uncommitted changes and files from the git repo, use

git restore .

to get rid of un committed changes in git tracked files. To get rid of all untracked files in the repo, use:

git clean -fd

10.2.13 Clean up after running C++ tests

10.2.13.1 Clean up CMake-generated files and re-run tests

After running the examples above, the build generates files (i.e., the source code, libraries, and executables) and saves the files in the build subfolder. The example above demonstrates an “out-of-source” build which puts generated files in a completely separate directory, so that the source tree is unchanged after running tests. Using a separate source and build tree reduces the need to delete files that differ between builds. If you still would like to delete CMake-generated files, just delete the build folder, and then build and run tests by repeating the commands below. The files from the build folder are included in the FIMS repository’s .gitignore file, so should not be pushed to the FIMS repository.

10.2.13.2 Clean up individual tests

For simple C++ functions like the examples above, we do not need to clean up the tests. Clean up is only necessary in a few situations.

  • If memory for an object was allocated during testing and not deallocated - The object needs to be deleted (e.g., delete object).
  • If you used a test fixture from GoogleTest to use the same data configuration for multiple tests, TearDown() can be used to clean up the test and then the test fixture will be deleted. Please see more details from GoogleTest user’s guide.

10.3 Templates for GoogleTest testing

This section includes templates for creating unit tests and benchmarks. This is the code that would go into the .cpp files in tests/gtest.

10.3.1 Unit test template

#include "gtest/gtest.h"
#include "../../src/code.hpp"

// # R code that generates true values for the test

namespace {

  // Description of Test 1
  TEST(TestSuiteName, Test1Name) {

    ... test body ...

  }

  // Description of Test 2
  TEST(TestSuiteName, Test2Name) {

    ... test body ...

  }

}

10.3.2 Benchmark template

#include "benchmark/benchmark.h"
#include "../../src/code.hpp"

void BM_FunctionName(benchmark::State& state)
{
  for (auto _ : state)
    // This code gets timed
    Function()
}

// Register the function as a benchmark
BENCHMARK(BM_FunctionName);

10.3.3 tests/gtest/CMakeLists.txt template

These lines are added each time a new test suite (all tests in a file) is added:

// Add test suite 1
add_executable(TestSuiteName1
  test1.cpp
)

target_link_libraries(TestSuiteName1
  gtest_main
)

gtest_discover_tests(TestSuiteName1)

These lines are added each time a new benchmark file is added:

// Add benchmark 1
add_executable(benchmark1
  benchmark1.cpp
)

target_link_libraries(benchmark1
  benchmark_main
)

10.4 R testing

FIMS uses {testthat} for writing R tests. You can install the packages following the instructions on testthat website. If you are not familiar with testthat, the testing chapter in R packages gives a good overview of testing workflow, along with structure explanation and concrete examples.

10.4.1 Testing FIMS locally

To test FIMS R functions interactively and locally, use devtools::install() rather than devtools::load_all(). This is because using load_all() will turn on the debugger, bloating the .o file, and may lead to a compilation error (e.e., Fatal error: can't write 326 bytes to section .text of FIMS.o: 'file too big' as: FIMS.o: too many sections (35851)). Note that useful interactive tests should should be converted into {testthat} or googletest tests.

10.4.2 Testing using gdbsource

You can interactively debug C++ code using TMB::gdbsource() in RStudio. Just add these two lines to the top of the test-fims-estimation.R file

require(testthat)
devtools::load_all("C:\\Users\\chris\\noaa-git\\FIMS")

10.4.3 R testthat naming conventions and file organization

  • We try to group functions and their helpers together (the “main function plus helpers” approach)
  • Always name the test file the same as the R file, but with test- prepended (ex, test-myfunction.R contains testthat tests for the R code in R/myfunction.R). This is the convention in the tidyverse style guide.
  • testthat tests that are a test of rcpp should be called test-rcpp-[description].R
  • Integration tests which do not have a corresponding .R file should use the convention test-integration-[description].R.

10.4.4 R testthat template

The format for an individual testthat test is is:

test_that("TestName", {

  ...test body...

})

Multiple testthat tests can be put in the same file if they are related to the same .R file (see naming conventions above).

10.5 Test case documentation template and examples

A testing plan must be developed while designing (i.e., before coding) new FIMS features or Rcpp modules. Please update the test cases in the FIMS/tests/milestoneX_test_cases.md file (e.g., FIMS/tests/miletone1_test_cases.md). This testing plan is documented using the test case documentation template below.

10.5.1 Test case documentation template

Individual functional or integration test cases will be designed following the template below.

  • Test ID. Create a meaningful name for the test case.

  • Features to be tested. Provide a brief statement of test objectives and description of the features to be tested. (Identify the test items following the FIMS software design specification document and identify all features that will not be tested and the rationale for exclusion)

  • Approach. Specify the approach that will ensure that the features are adequately tested and specify which type of test is used in this case.

  • Evaluation criteria. Provide a list of expected results and acceptance criteria.

    • Pass/fail criteria. Specify the criteria used to determine whether each feature has passed or failed testing.

    • In addition to setting pass/fail criteria with specific tolerance values, a documentation that just views the outputs of some tests may be useful if the tests require additional computations, simulations, and comparisons

  • Test deliverables. Identify all information that is to be delivered by the test activity.

    • Test logs and automated status reports

10.5.2 Test case documentation examples

10.5.2.1 General test case documentation

The test case documentation below is a general case to apply to many functions/modules. For individual functions/modules, please make detailed test cases for specific options, noting “same as the general test case” where appropriate.

Test ID

Features to be tested

General test case

  • The function/module returns correct output values given different input values

  • The function/module returns error messages when users give wrong types of inputs

  • The function/module notifies an error if the input value is outside the bound of the input parameter

Approach
  • Prepare expected true values using R

  • Run tests in R using testthat and compare output values with expected values

  • Push tests to the working repository and run tests using GitHub Actions

  • Run tests in different OS environments (windows latest, macOS latest, and ubuntu latest) using GitHub Actions

  • Submit pull request for code review

Evaluation Criteria
  • The tests pass if the output values equal to the expected true values

  • The tests pass if the function/module returns error messages when users give wrong types of inputs

  • The tests pass if the function/module returns error messages when user provides an input value that is outside the bound of the input parameter

Test deliverables
  • Test logs on GitHub Actions. Document results of logs in the feature pull request.

10.5.2.2 Functional test example: TMB probability mass function of the multinomial distribution

Test ID Probability mass function of the multinomial distribution
Features to be tested
  • Same as the general test case
Approach

Functional test

  • Prepare expected true values using R function dmultinom from package ‘stats’
Evaluation Criteria
  • Same as the general test case
Test deliverables
  • Same as the general test case

10.5.2.3 Integration test example: Li et al. 2021 age-structured stock assessment model comparison

Test ID Age-structured stock assessment comparison (Li et al. 2021)
Features to be tested
  • Null case (update standard deviation of the log of recruitment from 0.2 to 0.5 based on Siegfried et al. 2016 snapper-grouper complex)

  • Recruitment variability

  • Stochastic Fishing mortality (F)

  • F patterns (e.g., roller coaster: up then down and down then up; constant Flow, FMSY, and Fhigh)

  • Selectivity patterns

  • Recruitment bias adjustment

  • Initial condition

  • (unit of catch: number or weight)

  • Model misspecification (e.g., growth, natural mortality, and steepness, catchability etc)

Approach

Integration test

Evaluation Criteria
  • Summarize median absolute relative error (MARE) between true values from the operating model and the FIMS estimation model

  • If all MAREs from the null case are less than 10% and all MARES are less than 15%, the tests pass. If the MAREs are greater than 15%, a closer examination is needed.

Test deliverables
  • In addition to the test logs on GitHub Actions, a document that includes comparison figures from various cases (e.g., Fig 5 and 6 from Li et al. 2021) will be automatically generated

  • A table that shows median absolute relative errors in unfished recruitment, catchability, spawning stock biomass, recruitment, fishing mortality, and reference points (e.g., Table 6 from Li et al. 2021) will be automatically generated

10.5.2.4 simulation testing: challenges and solutions

One thing that might be challenging for comparing simulation results is that changes to the order of calls to simulate will change the simulated values. Tests may fail even though it is just because different random numbers are used or the order of the simulation changes through model development. Several solutions could be used to address the simulation testing issue. Please see discussions on the FIMS-planning issue page for details.

  • Once we start developing simulation modules,we can use these two ways to compare simulated data from FIMS and a test:
    • Add a TRUE/FALSE parameter in each FIMS simulation module for setting up testing seed. When testing the module, set the paramter to TRUE to fix the seed number in R and conduct tests.
    • If adding a TRUE/FALSE parameter does not work as expected, then carefully check simulated data from each component and make sure it is not a model coding error.
  • FIMS will use set.seed() from R to set the seed. The {rstream} package will be investigated if one of the requirements of FIMS simulation module is to generate multiple streams of random numbers to associate distinct streams of random numbers with different sources of randomness. {rstream} was specifically designed to address the issue of needing very long streams of pseudo-random numbers for parallel computations. Please see rstream paper and RngStreams for more details.