Software testing for Fisheries Integrated Modeling System (FIMS): strategies, challenges, and opportunities

class: center, middle, inverse, title-slide

.title[
# Software testing for Fisheries <br> Integrated Modeling System <br> (FIMS): strategies, challenges, <br> and opportunities
]
.author[
### Bai Li <br><br> Contractor with ECS Federal LLC in support of <br> NOAA Fisheries Office of Science and Technology <br> Email: <a href="mailto:bai.li@noaa.gov" class="email">bai.li@noaa.gov</a>
]
.institute[
### RSE Testing Working Group Meeting
]
.date[
### 2025/05/21
]

---

layout: true

.footnote[U.S. Department of Commerce | National Oceanic and Atmospheric Administration | National Marine Fisheries Service]

---
# Disclaimer

.pull-left[
Previous workflow
<img src="slides_20250521_files/figure-html/unnamed-chunk-1-1.png" width="80%" />
]

.pull-right[
Current workflow
<img src="slides_20250521_files/figure-html/unnamed-chunk-2-1.png" width="80%" />
]

- I’ve been learning about software testing since 2020.

--
- I’d like to share what has and hasn’t worked well for testing in the FIMS project.

--
- The examples I’ll present focus on R and C++ code developed by the FIMS implementation team (.hyperlink-style[[https://noaa-fims.github.io/FIMS/authors.html](https://noaa-fims.github.io/FIMS/authors.html)]).

--
- Comments and suggestions are very welcome!

.footnote[
Figure source: .hyperlink-style[[Getting started with unit testing in R](https://www.pipinghotdata.com/posts/2021-11-23-getting-started-with-unit-testing-in-r/)]
]

---
# Outline
- Introduction to FIMS

--
- Overview of FIMS testing
  - Types of tests implemented
  - Testing frameworks and tools used
  - Team roles and responsibilities

--
- Test design and evaluation

--
- Test coverage and reporting

--
- Automation testing

--
- Current challenges
  - How to define tolerance criteria for statistical models?
  - How to better manage test location and execution time?
- Opportunities for improvement
  - Standardize test templates
  - Improve execution helpers
  - Establish a quarterly review process

---
# Introduction to FIMS

.left-column[
**Fisheries Integrated Modeling System**
<img src="static/FIMS_hexlogo.png" width="80%" />
]

.right-column[
**What is FIMS?**

A flexible suite of software tools to support sustainable fishery management

- Fisheries stock assessment*<sup>1,2</sup>* at core. 
- Connects to ecosystem, climate, and economic models/data.
- Flexible for innovative future modeling work.
- Collaborative community effort.
- Addresses numerous priorities. 
]

.footnote[
.hyperlink-style[[FIMS landing page](https://noaa-fims.github.io/FIMS/)] | .hyperlink-style[[FIMS GitHub repo](https://github.com/noaa-fims/fims/)]<br>
[1] .hyperlink-style[[Fishery stock assessment:](https://www.fisheries.noaa.gov/insight/stock-assessment-model-descriptions)] The scientific process of collecting, analyzing, and reporting on the condition <br> of a fish stock and estimating its sustainable yield.<br>
[2] .hyperlink-style[[Stock assessment model:](https://www.fisheries.noaa.gov/insight/stock-assessment-model-descriptions)] The mathematical and statistical techniques stock assessments use to analyze <br> and understand the impact of fisheries and environmental factors on fish stocks.
]

---
# Introduction to FIMS
<img src="static/fims_path_simple.png" width="80%" />
- FIMS modules are written in C++ and linked to R*<sup>1</sup>* using Rcpp*<sup>2</sup>*
- Template Model Builder*<sup>3</sup>* serves as the engine for statistical inference of FIMS

.footnote[

[1] .hyperlink-style[[R:](https://www.r-project.org/)] A programming language and free software environment for statistical computing and graphics.<br>
[2] .hyperlink-style[[Rcpp:](https://cran.r-project.org/web/packages/Rcpp/index.html)] An R package that provides R functions as well as C++ classes which offer a seamless<br> integration of R and C++.<br>
[3] .hyperlink-style[[Template Model Builder:](https://kaskr.github.io/adcomp/Introduction.html)] An R package for fitting statistical latent variable models to data.
]

---
# Overview of FIMS testing

.footnote[
Figure source: .hyperlink-style[[GeeksforGeeks](https://www.geeksforgeeks.org/types-software-testing/)]
]

---
# Types of tests implemented

<div class="DiagrammeR html-widget html-fill-item" id="htmlwidget-a23e5f354c202d4d007f" style="width:100%;height:504px;"></div>
<script type="application/json" data-for="htmlwidget-a23e5f354c202d4d007f">{"x":{"diagram":"\n   graph TB\n   a[Types of FIMS Testing]-->A[Manual Testing]\n   a-->B[Automation Testing]\n   A-->C[Functional Testing]\n   A-->D[Non-Functional Testing]\n   C-->E[Unit Testing]\n   C-->F[Integration Testing]\n   C-->G[System Testing]\n   D-->H[Performance Testing]\n   D-->I[Usability Testing]\n   D-->J[Compatibility Testing]\n   style a fill:#C6E6F0\n   style A fill:#90DFE3\n   style B fill:#90DFE3\n   style C fill:#B1DC6B\n   style D fill:#B1DC6B\n   style E fill:#A8B8FF\n   style F fill:#A8B8FF\n   style G fill:#A8B8FF\n   style H fill:#A8B8FF\n   style I fill:#A8B8FF\n   style J fill:#A8B8FF\n"},"evals":[],"jsHooks":[]}</script>

---
# Types of tests implemented
<table class="table table-striped table-hover" style="font-size: 12px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
<tr>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Tests</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Testing Frameworks and Tools</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Roles and Responsibilities</div></th>
</tr>
  <tr>
   <th style="text-align:left;"> Test Type </th>
   <th style="text-align:left;"> Description </th>
   <th style="text-align:left;"> GoogleTest (C++) </th>
   <th style="text-align:left;"> testthat (R) </th>
   <th style="text-align:left;"> Case Studies </th>
   <th style="text-align:left;"> Developers </th>
   <th style="text-align:left;"> Testers </th>
   <th style="text-align:left;"> Users </th>
   <th style="text-align:left;"> GitHub Actions </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Unit Testing </td>
   <td style="text-align:left;"> Test individual modules </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ✔️ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Integration Testing </td>
   <td style="text-align:left;"> Test interactions between modules </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ✔️ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> System Testing </td>
   <td style="text-align:left;"> Evaluate the overall functionality of a complete FIMS model </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Performance Testing </td>
   <td style="text-align:left;"> Ensure the system performs properly under its expected workload </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Usability Testing </td>
   <td style="text-align:left;"> Assess user interface effectiveness </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Compatibility Testing </td>
   <td style="text-align:left;"> Check FIMS compatibility across different operating systems </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ✔️ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ❌ </td>
   <td style="text-align:left;"> ✔️ </td>
  </tr>
</tbody>
</table>

---
# Unit testing

- Test individual modules 
  - Identify bugs early in the development process
  - Promote modular and maintainable code

.pull-left[
![](slides_20250521_files/figure-html/unnamed-chunk-8-1.png)
]

.footnote[
Figure source: .hyperlink-style[[FIMS selectivity Rcpp tests](https://github.com/NOAA-FIMS/FIMS/blob/main/tests/testthat/test-rcpp-selectivity.R)]
]

???
- Testing frameworks and tools
  - Use **GoogleTest** to test C++ code
  - Use **testthat** to test R code
- Roles and responsibilities
  - Module developers write tests and execute them locally
  - GitHub Actions (GHA) automatically runs all tests before each Pull Request (PR) as a safeguard 
  
---
# Integration testing

- Test how different modules interact with each other
  - Ensure that different modules work together as intended
  - Identify issues that may arise when different modules are combined

![](slides_20250521_files/figure-html/unnamed-chunk-9-1.png)

.footnote[
Figure source: .hyperlink-style[[FIMS fleet Rcpp tests](https://github.com/NOAA-FIMS/FIMS/blob/main/tests/testthat/test-rcpp-fleet-interface.R)]
]

???
- Testing frameworks and tools
  - Use **GoogleTest** to test C++ code
  - Use **testthat** to test R code
- Roles and responsibilities
  - Developers or testers write tests and execute them locally
  - GHA automatically runs all tests before each PR as a safeguard

---
# System testing

- Evaluate the overall functionality of a complete FIMS model
  - Check the entire functionality of the system
  - Check if the product meets the technical and business requirements of clients

<table class="table table-striped table-hover" style="font-size: 12px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Test Type </th>
   <th style="text-align:left;"> Description </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Deterministic Test </td>
   <td style="text-align:left;"> Fix parameters at "true" values from the operating model. </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Estimation Test 1 </td>
   <td style="text-align:left;"> Estimate using age composition input only. </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Estimation Test 2 </td>
   <td style="text-align:left;"> Estimate using length composition input only. </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Estimation Test 3 </td>
   <td style="text-align:left;"> Estimate using both age and length composition input. </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Estimation Test 4 </td>
   <td style="text-align:left;"> Estimate with NAs in the input data. </td>
  </tr>
</tbody>
</table>

.footnote[
Figure source: .hyperlink-style[[FIMS tests](https://github.com/NOAA-FIMS/FIMS/blob/main/tests/testthat/test-integration-caa-mle-wrappers.R)]
]

???
- Testing frameworks and tools
  - Use case studies to test FIMS in a separate repo
- Roles and responsibilities
  - Tester/users write tests and execute them locally
  - GHA automatically runs all test cases

---
# Performance testing
.pull-left[
- Ensure the system performs properly under its expected workload
  - Ensure the speed, load capability, and accuracy of the system
  - Identify bottlenecks in the systems 
]

.pull-right[
![](slides_20250521_files/figure-html/unnamed-chunk-11-1.png)
]

.footnote[
Figure source: .hyperlink-style[[FIMS parallel tests](https://github.com/NOAA-FIMS/FIMS/blob/main/tests/testthat/test-parallel-caa-mle-wrappers.R)]
]

???
- Testing frameworks and tools
  - Use **testthat** to check if FIMS can be run in parallel
  - Use case studies to test FIMS in a separate repo
- Roles and responsibilities
  - Tester/users write tests and execute them locally
  - GHA automatically runs all test cases 
  
---
# Usability testing

- Done from an end user’s perspective to determine if the system is easily usable
  - Evaluate the effectiveness of the user interface design 
  - Identify user pain points
  
<iframe src="https://noaa-fims.github.io/case-studies/content/NEFSC-yellowtail.html" width="100%" height="400px" data-external="1"></iframe>

---
# Compatibility testing

- Check FIMS compatibility (running capability) across different operating systems
  - Provide service across multiple operating systems
  - Identify bugs during the development process

![](https://raw.githubusercontent.com/EgoistDeveloper/operating-system-logos/master/src/128x128/WIN.png)![](https://raw.githubusercontent.com/EgoistDeveloper/operating-system-logos/master/src/128x128/MAC.png)![](https://raw.githubusercontent.com/EgoistDeveloper/operating-system-logos/master/src/128x128/LIN.png)

.footnote[
Figure source: .hyperlink-style[[ngeenx/operating-system-logos](https://github.com/ngeenx/operating-system-logos)]
]

???
- Testing frameworks and tools
  - Run C++ and R tests on Windows, MacOS, and Ubuntu
- Roles and responsibilities
  - GHA automatically runs all tests

---
# Test design and evaluation

- Use snapshot tests (also known as .hyperlink-style[[golden tests](https://ro-che.info/articles/2017-12-04-golden-tests)]) to capture FIMS output in human-readable files and track changes over time
- Use test fixtures to provide input data and expected results for multiple tests, reducing redundancy and improving maintainability
- Use built-in functions (e.g., expect_equal()) to compare computed values against known reference values (e.g., unit tests or deterministic integration tests)
- Relax tolerance criteria for model estimates: ensure that no more than 5% of estimates deviate from expected values by more than 2 standard errors (e.g., estimation tests)

---
# Test coverage and reporting
- Use Codecov to track and report code coverage
- Rely on coverage metrics from the R side, as R code also triggers the underlying C++ code
- Coverage has remained between 65% and 95% since 2022
- Coverage changes are automatically reported in each pull request

.pull-left[
![](slides_20250521_files/figure-html/unnamed-chunk-14-1.png)
]

.pull-right[
![](slides_20250521_files/figure-html/unnamed-chunk-15-1.png)
]

.footnote[
Figure source: .hyperlink-style[[FIMS codecov](https://app.codecov.io/gh/NOAA-FIMS/FIMS?search=&trend=all%20time)] |
.hyperlink-style[[Pull request codecov](https://github.com/NOAA-FIMS/FIMS/pull/837)]
]

---
# Automation testing
- Use appropriate automation tools (e.g., GitHub Actions) to run tests for FIMS
- C++ and R tests are automatically executed on every commit push.
- Code coverage results are reported for every pull request

![](slides_20250521_files/figure-html/unnamed-chunk-16-1.png)

.footnote[
Figure source: .hyperlink-style[[FIMS GitHub Actions](https://github.com/NOAA-FIMS/FIMS/actions/workflows/run-googletest.yml)]
]

---
# Current challenges

![](https://static7.depositphotos.com/1026550/780/i/600/depositphotos_7809833-stock-photo-challenges-ahead-road-sign.jpg)

---
# How to better manage test location and execution time?
.pull-left[
- Current R test suites can take up to 10 minutes to run
- Which tests should run locally vs. remotely to balance speed and reliability?
- Should we separate core vs. extended tests into different repositories?
]

.pull-right[
<img src="https://nexwebsites.com/images/service/software-testing-services.svg" width="80%" />
]

Mentimeter link: .hyperlink-style[[https://www.menti.com/al8ugsvp2kk3](https://www.menti.com/al8ugsvp2kk3)]

???
- Create a developer function to run only unit tests locally
- How frequently should external tests be executed and maintained?
- Where and how should we define dynamic test input data that isn’t based on fixed values?

---
# How to define tolerance criteria for statistical models?

.pull-left[
- 95% of the estimates within two standard error of the expected values
- What is an acceptable range for model estimates under different modeling conditions?
- If a single model run fails the tolerance check, should we explore alternative input data or flag it as an issue?
]

.pull-right[
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Standard_deviation_diagram.svg/330px-Standard_deviation_diagram.svg.png" width="80%" />
]

Mentimeter link: .hyperlink-style[[https://www.menti.com/al8ugsvp2kk3](https://www.menti.com/al8ugsvp2kk3)]

.footnote[
Figure source: .hyperlink-style[[Wikipedia](https://en.wikipedia.org/wiki/Standard_error)]
]

???
- How can we manage tolerance testing across both maximum likelihood and Bayesian estimation frameworks within the same suite?
---
# Opportunities for improvement
<iframe src="https://static7.depositphotos.com/1000376/768/i/600/depositphotos_7688025-stock-photo-opportunity-road-sign-on-background.jpg" width="100%" height="400px" data-external="1"></iframe>

---
# Standardize test templates
.pull-left[
- Provide a reusable test template to help developers write consistent R tests
- Use FIMS:::use_testthat_template() to generate a new test file
- The template includes three types of tests: input/output checks, edge cases, and error handling
- Currently implemented for R, a similar template for C++ is planned
]

.pull-right[
![](slides_20250521_files/figure-html/unnamed-chunk-21-1.png)
]

.footnote[
Figure source: .hyperlink-style[[FIMS testhat template](https://github.com/NOAA-FIMS/FIMS/blob/main/inst/templates/testthat_template.R)]
]

???
- Input and output correctness: ensure that the function behaves as expected with correct inputs and returns the expected outputs
- Edge-case handling: validate the function's performance with unusual scenarios
- Built-in errors and warnings: confirm that appropriate error and warning messages are triggered under exceptional conditions

---
# Improve execution helpers

- Create dedicated R functions to run specific types of tests (e.g., C++ tests, R unit tests, R integration tests)
- Use one wrapper function check_fims() to streamline all test execution before pushing to the remote repo

<table class="table table-striped table-hover" style="font-size: 12px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Function </th>
   <th style="text-align:left;"> Description </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> setup_gtest() </td>
   <td style="text-align:left;"> Sets up the environment and test files required for running C++ (GoogleTest) tests. </td>
  </tr>
  <tr>
   <td style="text-align:left;"> run_gtest() </td>
   <td style="text-align:left;"> Executes the C++ tests using the previously configured test environment. </td>
  </tr>
  <tr>
   <td style="text-align:left;"> setup_and_run_gtest() </td>
   <td style="text-align:left;"> Convenience wrapper to both set up and run C++ tests in a single call. </td>
  </tr>
  <tr>
   <td style="text-align:left;"> run_r_unit_tests() </td>
   <td style="text-align:left;"> Runs all R unit tests using the testthat framework, typically for individual modules. </td>
  </tr>
  <tr>
   <td style="text-align:left;"> run_r_integration_tests() </td>
   <td style="text-align:left;"> Runs R integration tests to validate interaction between multiple modules. </td>
  </tr>
  <tr>
   <td style="text-align:left;"> check_fims() </td>
   <td style="text-align:left;"> Runs a full suite of all tests (C++ and R) before pushing code to the remote repository. </td>
  </tr>
</tbody>
</table>

---
# Establish a quarterly review process
.pull-left[
- Ensure testing efforts are aligned with project goals
- Regularly review test coverage reports and identify gaps
- Use code club meetings to refactor, reorganize, and expand test suite
- Future improvement: use GitHub Actions to create an issue on a regular basis*<sup>1</sup>*
]

.pull-right[
- Quarterly review (04/01/2025*<sup>2</sup>*)
![](slides_20250521_files/figure-html/unnamed-chunk-23-1.png)
]

.footnote[
[1] .hyperlink-style[[Scheduling issue creation](https://docs.github.com/en/actions/use-cases-and-examples/project-management/scheduling-issue-creation)]<br>
[2] .hyperlink-style[[FIMS issue #761](https://github.com/NOAA-FIMS/FIMS/issues/761)]
]

---
# Summary
.pull-left[
- Write tests that include normal use, edge cases, and expected error cases
- Use code quality tools to verify coverage, formatting, and complexity
- Do not write tests just to boost code coverage
- Build a safety net and ensure good growth dynamics of a project
- Developers and testers need to enjoy their testing achievements
]

.pull-right[
![Growth dynamics between projects with good and bad tests](https://drek4537l1klr.cloudfront.net/khorikov/Figures/01fig02_alt.jpg)
]

.footnote[
Figure source: .hyperlink-style[[(Khorikov, 2020)](https://livebook.manning.com/book/unit-testing/chapter-1/45)]
]