[GSoC Project Proposal]: Expand the Google Test suite for the Fisheries Integrated Modeling System #81

kellijohnson-NOAA · 2025-02-10T18:30:11Z

Project Description

The Fisheries Integrated Modeling System (FIMS) is a framework to create statistical models, written in C++ and R, to assess the status of marine resources. Google tests and the testthat package in R are both used to ensure that methods are mathematically accurate, statistically sound, and code development does not degrade the accuracy of the framework. The current code coverage of the package is at 68 percent and we wish for tests to allow for 80 percent coverage. This project will add tests for uncovered code and suggest places where the tests, especially the Google tests, can be enhanced. This project provides an opportunity to enhance the reliability, accessibility, and capability of FIMS and improve understanding of marine resources dynamics and management.

Expected Outcomes

The main outcome of this project will be a more robust test suite within the FIMS package which will be measurable through a code coverage statistic, where FIMS currently has 68% code coverage. Secondarily, the code plan may need to be updated to include any new testing enhancements. For managers to adopt the use of FIMS we must prove that we can match previous models and provide sound answers. Given that we do not know if previous models, written in other computer languages, are actually correct, we are emphasizing a suite of self and cross tests. Ramping up the tests available will give more validity to FIMS when it is formally reviewed to be used in a management context.

Skills Required

C++, R - suggested, GitHub - suggested

Additional Background/Issues

Mentor(s)

Kelli Johnson (@kellijohnson-NOAA), Bai Li (@Bai-Li-NOAA)

Mentor Contact Email(s)

[email protected], [email protected]

Expected Project Size

175 hours

Project Difficulty

Intermediate

thedgarg31 · 2025-03-21T17:47:14Z

Hi @kellijohnson-NOAA and @Bai-Li-NOAA,

My name is Daksh Garg, and I am excited about contributing to the FIMS project for GSoC 2025. I have experience in C++, R, and GitHub, and I am particularly interested in enhancing the test coverage of FIMS.
I have already started exploring the repository and running the existing tests. I would appreciate any guidance on priority areas that need better coverage or any specific issues that you recommend I work on.

Looking forward to collaborating with you.

Thanks,
Daksh Garg

majilacodes · 2025-03-22T19:38:49Z

Greetings @kellijohnson-NOAA and @Bai-Li-NOAA,

Myself Akshat Majila, a computer science student with a keen interest in driving real world change through technology.
I must admit this project caught my attention the first time I came across it and I have been hooked ever since and would love to contribute to this project under GSoc'25 and do my bit in making FIMS the most reliable framework for marine resource analysis.

I started out by conducting a foundational analysis of critical components of the codebase, wherein I focused on population.hpp and its integration tests in integration_class.hpp. The analysis revealed critical gaps, with robust line coverage at 85.4% overshadowed by a mere 43.2% branch coverage, particularly in integration_class.hpp at 40.2%, indicating that complex decision paths - such as those in fleet or population dynamics logic, are underrepresented in the test suite. Edge cases like zero biomass and extreme fishing mortality rates appear underexplored as well, which could affect the model’s robustness in real-world applications.

With the baseline ready, I have identified particular untested functions and branches, and am currently working on developing targeted tests which includes unit tests (for methods such as CalculateRecruitment() and CalculateUnfishedBiomass(), ensuring outputs align with expected precision (e.g. absolute error < 1 metric ton) and parameterized tests (for methods like CalculateSpawningBiomass() with diverse inputs (e.g. varying years, recruitment levels) to address edge cases like zero biomass or maximum mortality)

Also, beyond test coverage, are there upcoming milestones or features in FIMS (e.g., new data inputs, R package integration) that I should consider integrating into my work to maximize impact?

Looking forward to your thoughts and guidance!

Best Regards
Akshat Majila

kellijohnson-NOAA · 2025-03-24T03:14:12Z

@thedgarg31 thank you for your interest in helping with the testing of FIMS. There are many areas where the testing could be enhanced in FIMS. I am curious if your interests lie in the R side of testing with testthat or testing of the C++ with GoogleTests? With regard to the former, many of the functions are tested but edge cases are missing, such as ensuring that appropriate error/warning messages are generated. For the C++ testing, I believe most of the simple modules are tested using known data/answers but larger integration tests are missing. The code coverage statistics can be investigated to determine what portions of the code can use more testing. Please let us know if you have any additional questions.

kellijohnson-NOAA · 2025-03-24T03:20:20Z

@majilacodes thank you for your interest in helping with the testing of FIMS. Your analysis is quite good for not having any instructions in the code base thus far. I am interested in hearing more about how you created the statistics, did they come from our code-coverage action or would you recommend some other way to analyze the code base for missing tests? Regarding future milestones, we have a branch that implements random effects in the code base that we believe will be more difficult to test because estimates of variance parameters have a lower tolerance of equality than fixed-effect parameters. Additionally, much of the infrastructure must be compiled and ran through TMB to test so we are unsure how to do unit tests in this instance. There are tests for the R side of things in the testthat folder, though coverage is not 100%. Are you familiar with coding in R as well as C++? We look forward to hearing from you.

majilacodes · 2025-03-24T13:25:08Z

Thank you for your encouraging response @kellijohnson-NOAA!

Regarding my analysis methodology, I employed a multi-faceted approach:

I first examined the existing github actions codecov workflow to understand the current coverage metrics.
To get more granular insights, I supplemented this with local gcov analysis using gcov -b -c flags (I found that running gcov with the -b flag provides detailed branch statistics that aren't always visible in the dashboard summaries), which revealed the branch coverage statistics (43.2% overall, 40.2% for integration_class.hpp).
I then used lcov to generate HTML reports that helped visualize uncovered code paths, particularly in the population dynamics components.
For function-level analysis, I leveraged gcov_summary to identify specific methods with low coverage.

This combined approach provided deeper insights than dashboard summaries alone. I'd be happy to formalize this process into a developer guide that future contributors could use.

For random effects, since there will be natural variation in variance estimates, one approach could be to use confidence intervals (for instance 95% confidence) to compare our model's variation estimation against generated stimulated data with a known variance (say 0.5)

For running unit tests, we could isolate C++ logic (like CalculateRecruitment() in population.hpp) by mocking TMB data structures (like fims::Vector<Type>) in tests/gtest/. Then we can test these functions independently with controlled inputs, bypassing TMB compilation for faster and more focused testing.

This is an initial impression though, I'll research more on this and keep you updated.

Additionally, for the R-side coverage improvements, I noticed the Rcpp module loading challenges when attempting to run coverage analysis. I believe we could address this by enhancing the package's .onLoad function with proper module initialization or creating test-specific module mocks for components that are difficult to initialize in test environments.
Nevertheless, this is what I obtained using cov <- package_coverage(process_isolation = TRUE, type = "none"), kindly let me know if this is accurate:

And yes, I'm quite experienced with both R and C++ - I've worked with both languages in statistical computing and simulation projects in addition to studying them as a part of my university curriculum :)

To facilitate more detailed and focused discussions, I'd like to request you whether we could switch to a 1:1 platform like email, discord or slack? I’d love to share a draft of my GSoC proposal, mockups for test designs, or further details on my plans. My email is [email protected]—please let me know if that works, or if there’s another platform you’d prefer!

Thank you again for your guidance and encouragement. Looking forward to contribute to this mission of sustainable fishery management!

Best Regards
Akshat Majila

thedgarg31 · 2025-03-24T18:08:50Z

Hi @kellijohnson-NOAA and @Bai-Li-NOAA,

Thank you for your detailed response and guidance. I truly appreciate the opportunity to collaborate on enhancing the testing suite for FIMS and contribute to its reliability and accuracy.

To answer your question, my primary focus lies in expanding the C++ testing with GoogleTests, particularly by addressing the current gaps in larger integration tests, which are essential for validating the framework's robustness. That said, I am also open to contributing to the R-side testing with testthat, specifically by covering missing edge cases and ensuring proper validation of error/warning messages.

Progress so far:
1)Repository Setup & Familiarization:
a)I have successfully cloned the FIMS repository, set up the development environment, and explored the test plan structure (tests/test_plan) and the testthat folder.
b)I ran the existing test suite and verified the current coverage reports to understand the code areas lacking sufficient test coverage.

2)Code Coverage Analysis:
To get more granular insights into the coverage, I used:
a)gcov -b -c to capture branch-level statistics, which helped identify missing branch test coverage.
b)lcov to generate HTML reports, providing a visual representation of uncovered code paths, particularly in the population dynamics and fleet modules.

Based on my initial analysis, I identified under-tested modules related to fleet dynamics and biomass calculation, where complex decision paths and edge cases are not adequately covered.

3)Test Exploration & Mocking:

a)I experimented with isolating C++ functions for independent testing by mocking TMB data structures (e.g., fims::Vector) in tests/gtest/. This allows for focused unit testing without requiring TMB compilation, improving test efficiency.

b)I created a local testing environment to repeatedly run and validate tests while making incremental improvements.

4)Technical Insights & Challenges Identified:
a)Larger Integration Tests Missing: While basic functionality is covered, larger integration tests validating the interaction between population dynamics, fleet operations, and recruitment models are missing.

b)Error and Warning Message Validation: On the R-side, certain edge cases are not thoroughly tested, especially for unexpected inputs. Validating that the system generates appropriate error and warning messages will enhance robustness.

c)Random Effects Testing: Given your mention of the upcoming random effects branch, I understand that variance parameters have lower tolerance equality, making them harder to test. I am considering using confidence intervals (e.g., 95%) to compare model variation estimates against simulated data for validation.

Future Intentions:
1)C++ Testing Expansion:

a)Prioritize large-scale integration tests covering complex modules, including fleet, population dynamics, and recruitment models.

b)Implement parameterized tests to validate edge cases (e.g., zero biomass, extreme fishing mortality rates).

c)Use mocked TMB data structures to independently test C++ functions with controlled inputs, ensuring accuracy and faster test execution.

2)R-side Test Coverage Improvement:

a)Expand testthat coverage by adding tests for missing edge cases and validating error/warning messages.

b)Address Rcpp module loading challenges by creating test-specific module mocks for smoother coverage analysis.

3)Documentation and Best Practices:

a)As part of my contribution, I plan to create a developer guide documenting the coverage analysis process (using gcov, lcov, and local testing workflows) to help future contributors efficiently analyze and expand the test suite.

b)I also intend to write detailed comments and documentation within the test code to improve readability and maintainability.

4)Random Effects Testing:

a)Once the random effects branch is integrated, I plan to explore strategies for effectively testing variance parameters, potentially using simulated data with known variance and confidence interval validation.

Proposal Draft & Collaboration:
To further clarify my approach, I have started drafting my GSoC proposal, outlining my detailed plans, milestones, and testing strategies.

Would you be open to reviewing my draft proposal? If so, I can share it with you over email for your feedback and suggestions.
You can reach me at: [email protected]

Next Steps :
I would appreciate any guidance on priority modules or specific areas you recommend focusing on first.

1)If you have any test structuring guidelines or best practices you would like me to follow, please let me know.

2)I am also open to any feedback on my current approach or suggestions for improvement.

3)I am excited to continue contributing to FIMS and look forward to collaborating further with you on this important project.

Best regards,
Daksh Garg
Email: [email protected]

kellijohnson-NOAA added GSoC25 project idea Designates a proposed project idea labels Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSoC Project Proposal]: Expand the Google Test suite for the Fisheries Integrated Modeling System #81

[GSoC Project Proposal]: Expand the Google Test suite for the Fisheries Integrated Modeling System #81

kellijohnson-NOAA commented Feb 10, 2025 •

edited

Loading

thedgarg31 commented Mar 21, 2025

majilacodes commented Mar 22, 2025

kellijohnson-NOAA commented Mar 24, 2025

kellijohnson-NOAA commented Mar 24, 2025

majilacodes commented Mar 24, 2025 •

edited

Loading

thedgarg31 commented Mar 24, 2025

[GSoC Project Proposal]: Expand the Google Test suite for the Fisheries Integrated Modeling System #81

[GSoC Project Proposal]: Expand the Google Test suite for the Fisheries Integrated Modeling System #81

Comments

kellijohnson-NOAA commented Feb 10, 2025 • edited Loading

Project Description

Expected Outcomes

Skills Required

Additional Background/Issues

Mentor(s)

Mentor Contact Email(s)

Expected Project Size

Project Difficulty

thedgarg31 commented Mar 21, 2025

majilacodes commented Mar 22, 2025

kellijohnson-NOAA commented Mar 24, 2025

kellijohnson-NOAA commented Mar 24, 2025

majilacodes commented Mar 24, 2025 • edited Loading

thedgarg31 commented Mar 24, 2025

kellijohnson-NOAA commented Feb 10, 2025 •

edited

Loading

majilacodes commented Mar 24, 2025 •

edited

Loading