class: center, middle, inverse, title-slide .title[ # Software testing for Fisheries
Integrated Modeling System
(FIMS): strategies, challenges,
and opportunities ] .author[ ### Bai Li
Contractor with ECS Federal LLC in support of
NOAA Fisheries Office of Science and Technology
Email:
bai.li@noaa.gov
] .institute[ ### RSE Testing Working Group Meeting ] .date[ ### 2025/05/21 ] --- layout: true .footnote[U.S. Department of Commerce | National Oceanic and Atmospheric Administration | National Marine Fisheries Service] --- # Disclaimer .pull-left[ Previous workflow <img src="slides_20250521_files/figure-html/unnamed-chunk-1-1.png" width="80%" /> ] .pull-right[ Current workflow <img src="slides_20250521_files/figure-html/unnamed-chunk-2-1.png" width="80%" /> ] - I’ve been learning about software testing since 2020. -- - I’d like to share what has and hasn’t worked well for testing in the FIMS project. -- - The examples I’ll present focus on R and C++ code developed by the FIMS implementation team (.hyperlink-style[[https://noaa-fims.github.io/FIMS/authors.html](https://noaa-fims.github.io/FIMS/authors.html)]). -- - Comments and suggestions are very welcome! .footnote[ Figure source: .hyperlink-style[[Getting started with unit testing in R](https://www.pipinghotdata.com/posts/2021-11-23-getting-started-with-unit-testing-in-r/)] ] --- # Outline - Introduction to FIMS -- - Overview of FIMS testing - Types of tests implemented - Testing frameworks and tools used - Team roles and responsibilities -- - Test design and evaluation -- - Test coverage and reporting -- - Automation testing -- - Current challenges - How to define tolerance criteria for statistical models? - How to better manage test location and execution time? - Opportunities for improvement - Standardize test templates - Improve execution helpers - Establish a quarterly review process --- # Introduction to FIMS .left-column[ **Fisheries Integrated Modeling System** <img src="static/FIMS_hexlogo.png" width="80%" /> ] .right-column[ **What is FIMS?** A flexible suite of software tools to support sustainable fishery management - Fisheries stock assessment*<sup>1,2</sup>* at core. - Connects to ecosystem, climate, and economic models/data. - Flexible for innovative future modeling work. - Collaborative community effort. - Addresses numerous priorities. ] .footnote[ .hyperlink-style[[FIMS landing page](https://noaa-fims.github.io/FIMS/)] | .hyperlink-style[[FIMS GitHub repo](https://github.com/noaa-fims/fims/)]<br> [1] .hyperlink-style[[Fishery stock assessment:](https://www.fisheries.noaa.gov/insight/stock-assessment-model-descriptions)] The scientific process of collecting, analyzing, and reporting on the condition <br> of a fish stock and estimating its sustainable yield.<br> [2] .hyperlink-style[[Stock assessment model:](https://www.fisheries.noaa.gov/insight/stock-assessment-model-descriptions)] The mathematical and statistical techniques stock assessments use to analyze <br> and understand the impact of fisheries and environmental factors on fish stocks. ] --- # Introduction to FIMS <img src="static/fims_path_simple.png" width="80%" /> - FIMS modules are written in C++ and linked to R*<sup>1</sup>* using Rcpp*<sup>2</sup>* - Template Model Builder*<sup>3</sup>* serves as the engine for statistical inference of FIMS .footnote[ [1] .hyperlink-style[[R:](https://www.r-project.org/)] A programming language and free software environment for statistical computing and graphics.<br> [2] .hyperlink-style[[Rcpp:](https://cran.r-project.org/web/packages/Rcpp/index.html)] An R package that provides R functions as well as C++ classes which offer a seamless<br> integration of R and C++.<br> [3] .hyperlink-style[[Template Model Builder:](https://kaskr.github.io/adcomp/Introduction.html)] An R package for fitting statistical latent variable models to data. ] --- # Overview of FIMS testing <img src="https://media.geeksforgeeks.org/wp-content/uploads/20240730150406/Software-Testing-768-copy.webp" width="80%" /> .footnote[ Figure source: .hyperlink-style[[GeeksforGeeks](https://www.geeksforgeeks.org/types-software-testing/)] ] --- # Types of tests implemented
--- # Types of tests implemented <table class="table table-striped table-hover" style="font-size: 12px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Tests</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Testing Frameworks and Tools</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Roles and Responsibilities</div></th> </tr> <tr> <th style="text-align:left;"> Test Type </th> <th style="text-align:left;"> Description </th> <th style="text-align:left;"> GoogleTest (C++) </th> <th style="text-align:left;"> testthat (R) </th> <th style="text-align:left;"> Case Studies </th> <th style="text-align:left;"> Developers </th> <th style="text-align:left;"> Testers </th> <th style="text-align:left;"> Users </th> <th style="text-align:left;"> GitHub Actions </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Unit Testing </td> <td style="text-align:left;"> Test individual modules </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ✔️ </td> </tr> <tr> <td style="text-align:left;"> Integration Testing </td> <td style="text-align:left;"> Test interactions between modules </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ✔️ </td> </tr> <tr> <td style="text-align:left;"> System Testing </td> <td style="text-align:left;"> Evaluate the overall functionality of a complete FIMS model </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> </tr> <tr> <td style="text-align:left;"> Performance Testing </td> <td style="text-align:left;"> Ensure the system performs properly under its expected workload </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> </tr> <tr> <td style="text-align:left;"> Usability Testing </td> <td style="text-align:left;"> Assess user interface effectiveness </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> </tr> <tr> <td style="text-align:left;"> Compatibility Testing </td> <td style="text-align:left;"> Check FIMS compatibility across different operating systems </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ✔️ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ❌ </td> <td style="text-align:left;"> ✔️ </td> </tr> </tbody> </table> --- # Unit testing - Test individual modules - Identify bugs early in the development process - Promote modular and maintainable code .pull-left[ <!-- --> ] .footnote[ Figure source: .hyperlink-style[[FIMS selectivity Rcpp tests](https://github.com/NOAA-FIMS/FIMS/blob/main/tests/testthat/test-rcpp-selectivity.R)] ] ??? - Testing frameworks and tools - Use **GoogleTest** to test C++ code - Use **testthat** to test R code - Roles and responsibilities - Module developers write tests and execute them locally - GitHub Actions (GHA) automatically runs all tests before each Pull Request (PR) as a safeguard --- # Integration testing - Test how different modules interact with each other - Ensure that different modules work together as intended - Identify issues that may arise when different modules are combined <!-- --> .footnote[ Figure source: .hyperlink-style[[FIMS fleet Rcpp tests](https://github.com/NOAA-FIMS/FIMS/blob/main/tests/testthat/test-rcpp-fleet-interface.R)] ] ??? - Testing frameworks and tools - Use **GoogleTest** to test C++ code - Use **testthat** to test R code - Roles and responsibilities - Developers or testers write tests and execute them locally - GHA automatically runs all tests before each PR as a safeguard --- # System testing - Evaluate the overall functionality of a complete FIMS model - Check the entire functionality of the system - Check if the product meets the technical and business requirements of clients <table class="table table-striped table-hover" style="font-size: 12px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Test Type </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Deterministic Test </td> <td style="text-align:left;"> Fix parameters at "true" values from the operating model. </td> </tr> <tr> <td style="text-align:left;"> Estimation Test 1 </td> <td style="text-align:left;"> Estimate using age composition input only. </td> </tr> <tr> <td style="text-align:left;"> Estimation Test 2 </td> <td style="text-align:left;"> Estimate using length composition input only. </td> </tr> <tr> <td style="text-align:left;"> Estimation Test 3 </td> <td style="text-align:left;"> Estimate using both age and length composition input. </td> </tr> <tr> <td style="text-align:left;"> Estimation Test 4 </td> <td style="text-align:left;"> Estimate with NAs in the input data. </td> </tr> </tbody> </table> .footnote[ Figure source: .hyperlink-style[[FIMS tests](https://github.com/NOAA-FIMS/FIMS/blob/main/tests/testthat/test-integration-caa-mle-wrappers.R)] ] ??? - Testing frameworks and tools - Use case studies to test FIMS in a separate repo - Roles and responsibilities - Tester/users write tests and execute them locally - GHA automatically runs all test cases --- # Performance testing .pull-left[ - Ensure the system performs properly under its expected workload - Ensure the speed, load capability, and accuracy of the system - Identify bottlenecks in the systems ] .pull-right[ <!-- --> ] .footnote[ Figure source: .hyperlink-style[[FIMS parallel tests](https://github.com/NOAA-FIMS/FIMS/blob/main/tests/testthat/test-parallel-caa-mle-wrappers.R)] ] ??? - Testing frameworks and tools - Use **testthat** to check if FIMS can be run in parallel - Use case studies to test FIMS in a separate repo - Roles and responsibilities - Tester/users write tests and execute them locally - GHA automatically runs all test cases --- # Usability testing - Done from an end user’s perspective to determine if the system is easily usable - Evaluate the effectiveness of the user interface design - Identify user pain points <iframe src="https://noaa-fims.github.io/case-studies/content/NEFSC-yellowtail.html" width="100%" height="400px" data-external="1"></iframe> ??? - Testing frameworks and tools - Use case studies to test FIMS in a separate repo - Roles and responsibilities - Tester/users write tests and execute them locally - GHA automatically runs all test cases --- # Compatibility testing - Check FIMS compatibility (running capability) across different operating systems - Provide service across multiple operating systems - Identify bugs during the development process <!-- --><!-- --><!-- --> .footnote[ Figure source: .hyperlink-style[[ngeenx/operating-system-logos](https://github.com/ngeenx/operating-system-logos)] ] ??? - Testing frameworks and tools - Run C++ and R tests on Windows, MacOS, and Ubuntu - Roles and responsibilities - GHA automatically runs all tests --- # Test design and evaluation - Use snapshot tests (also known as .hyperlink-style[[golden tests](https://ro-che.info/articles/2017-12-04-golden-tests)]) to capture FIMS output in human-readable files and track changes over time - Use test fixtures to provide input data and expected results for multiple tests, reducing redundancy and improving maintainability - Use built-in functions (e.g., expect_equal()) to compare computed values against known reference values (e.g., unit tests or deterministic integration tests) - Relax tolerance criteria for model estimates: ensure that no more than 5% of estimates deviate from expected values by more than 2 standard errors (e.g., estimation tests) --- # Test coverage and reporting - Use Codecov to track and report code coverage - Rely on coverage metrics from the R side, as R code also triggers the underlying C++ code - Coverage has remained between 65% and 95% since 2022 - Coverage changes are automatically reported in each pull request .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] .footnote[ Figure source: .hyperlink-style[[FIMS codecov](https://app.codecov.io/gh/NOAA-FIMS/FIMS?search=&trend=all%20time)] | .hyperlink-style[[Pull request codecov](https://github.com/NOAA-FIMS/FIMS/pull/837)] ] --- # Automation testing - Use appropriate automation tools (e.g., GitHub Actions) to run tests for FIMS - C++ and R tests are automatically executed on every commit push. - Code coverage results are reported for every pull request <!-- --> .footnote[ Figure source: .hyperlink-style[[FIMS GitHub Actions](https://github.com/NOAA-FIMS/FIMS/actions/workflows/run-googletest.yml)] ] --- # Current challenges <!-- --> --- # How to better manage test location and execution time? .pull-left[ - Current R test suites can take up to 10 minutes to run - Which tests should run locally vs. remotely to balance speed and reliability? - Should we separate core vs. extended tests into different repositories? ] .pull-right[ <img src="https://nexwebsites.com/images/service/software-testing-services.svg" width="80%" /> ] Mentimeter link: .hyperlink-style[[https://www.menti.com/al8ugsvp2kk3](https://www.menti.com/al8ugsvp2kk3)] ??? - Create a developer function to run only unit tests locally - How frequently should external tests be executed and maintained? - Where and how should we define dynamic test input data that isn’t based on fixed values? --- # How to define tolerance criteria for statistical models? .pull-left[ - 95% of the estimates within two standard error of the expected values - What is an acceptable range for model estimates under different modeling conditions? - If a single model run fails the tolerance check, should we explore alternative input data or flag it as an issue? ] .pull-right[ <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Standard_deviation_diagram.svg/330px-Standard_deviation_diagram.svg.png" width="80%" /> ] Mentimeter link: .hyperlink-style[[https://www.menti.com/al8ugsvp2kk3](https://www.menti.com/al8ugsvp2kk3)] .footnote[ Figure source: .hyperlink-style[[Wikipedia](https://en.wikipedia.org/wiki/Standard_error)] ] ??? - How can we manage tolerance testing across both maximum likelihood and Bayesian estimation frameworks within the same suite? --- # Opportunities for improvement <iframe src="https://static7.depositphotos.com/1000376/768/i/600/depositphotos_7688025-stock-photo-opportunity-road-sign-on-background.jpg" width="100%" height="400px" data-external="1"></iframe> --- # Standardize test templates .pull-left[ - Provide a reusable test template to help developers write consistent R tests - Use FIMS:::use_testthat_template() to generate a new test file - The template includes three types of tests: input/output checks, edge cases, and error handling - Currently implemented for R, a similar template for C++ is planned ] .pull-right[ <!-- --> ] .footnote[ Figure source: .hyperlink-style[[FIMS testhat template](https://github.com/NOAA-FIMS/FIMS/blob/main/inst/templates/testthat_template.R)] ] ??? - Input and output correctness: ensure that the function behaves as expected with correct inputs and returns the expected outputs - Edge-case handling: validate the function's performance with unusual scenarios - Built-in errors and warnings: confirm that appropriate error and warning messages are triggered under exceptional conditions --- # Improve execution helpers - Create dedicated R functions to run specific types of tests (e.g., C++ tests, R unit tests, R integration tests) - Use one wrapper function check_fims() to streamline all test execution before pushing to the remote repo <table class="table table-striped table-hover" style="font-size: 12px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Function </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> setup_gtest() </td> <td style="text-align:left;"> Sets up the environment and test files required for running C++ (GoogleTest) tests. </td> </tr> <tr> <td style="text-align:left;"> run_gtest() </td> <td style="text-align:left;"> Executes the C++ tests using the previously configured test environment. </td> </tr> <tr> <td style="text-align:left;"> setup_and_run_gtest() </td> <td style="text-align:left;"> Convenience wrapper to both set up and run C++ tests in a single call. </td> </tr> <tr> <td style="text-align:left;"> run_r_unit_tests() </td> <td style="text-align:left;"> Runs all R unit tests using the testthat framework, typically for individual modules. </td> </tr> <tr> <td style="text-align:left;"> run_r_integration_tests() </td> <td style="text-align:left;"> Runs R integration tests to validate interaction between multiple modules. </td> </tr> <tr> <td style="text-align:left;"> check_fims() </td> <td style="text-align:left;"> Runs a full suite of all tests (C++ and R) before pushing code to the remote repository. </td> </tr> </tbody> </table> --- # Establish a quarterly review process .pull-left[ - Ensure testing efforts are aligned with project goals - Regularly review test coverage reports and identify gaps - Use code club meetings to refactor, reorganize, and expand test suite - Future improvement: use GitHub Actions to create an issue on a regular basis*<sup>1</sup>* ] .pull-right[ - Quarterly review (04/01/2025*<sup>2</sup>*) <!-- --> ] .footnote[ [1] .hyperlink-style[[Scheduling issue creation](https://docs.github.com/en/actions/use-cases-and-examples/project-management/scheduling-issue-creation)]<br> [2] .hyperlink-style[[FIMS issue #761](https://github.com/NOAA-FIMS/FIMS/issues/761)] ] --- # Summary .pull-left[ - Write tests that include normal use, edge cases, and expected error cases - Use code quality tools to verify coverage, formatting, and complexity - Do not write tests just to boost code coverage - Build a safety net and ensure good growth dynamics of a project - Developers and testers need to enjoy their testing achievements ] .pull-right[  ] .footnote[ Figure source: .hyperlink-style[[(Khorikov, 2020)](https://livebook.manning.com/book/unit-testing/chapter-1/45)] ]