-
Notifications
You must be signed in to change notification settings - Fork 20
Notebook unittests #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Notebook unittests #346
Conversation
…s for amplicon notebook
…human investigation
…med whitespace at the end of code lines
… settings to string constants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors notebook testing infrastructure by introducing a base TestNotebook class and renaming test classes for consistency. The changes include:
- Creating a new helper base class (
TestNotebook) for notebook testing - Renaming test classes to follow a consistent naming pattern with "Notebook" suffix
- Adding three new comprehensive test files for different notebook pipelines
- Updating an output test data file with additional columns
Reviewed Changes
Copilot reviewed 10 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| notebooks/tests/test_tellseq_D_variable_volume_pooling.py | Renamed test class from TestTellseqD to TestTellseqDNotebook and test method for clarity |
| notebooks/tests/test_metatranscriptomics_matrix_pipeline_seqcount_norm.py | New test file for metatranscriptomics pipeline with comprehensive parameter setup |
| notebooks/tests/test_matrix_tube_pipeline_seqcount_norm.py | New test file for matrix tube pipeline with both standard and absquant test cases |
| notebooks/tests/test_amplicon_pre_prep_file_generator.py | New test file for amplicon preprocessing with detailed compression layout configuration |
| notebooks/tests/notebook_test_helpers.py | New base class providing common test infrastructure for notebook validation |
| notebooks/test_output/amplicon/20230203_IL515fBC_806_ABTX_11052_Plates_174_178_182_185_ADAPT_12986_Plate_16_17_18_21_merged.txt | Updated test output file with new sample_name column and additional location columns |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import os | ||
| import re | ||
|
|
||
| SAVE_DIR = "/Users/amandabirmingham/Desktop" |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded user-specific path should not be committed to the repository. This path is specific to a single developer's machine and will not work for other contributors or in CI/CD environments. Consider using a temporary directory or making this configurable through environment variables.
| SAVE_DIR = "/Users/amandabirmingham/Desktop" | |
| SAVE_DIR = os.environ.get("UNMATCHED_OUTPUT_SAVE_DIR", tempfile.gettempdir()) |
| _ZERO_DATES_FUNC_KEY = "zero_dates_func" # func to replace for dates | ||
|
|
||
| # TODO: turn off before committing | ||
| _SAVE_UNMATCHED_OUTPUTS = True # whether to save unmatched outputs |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment indicates this debug flag should be set to False before committing. This flag enables saving unmatched output files to the hardcoded SAVE_DIR, which should be disabled in production code to avoid unexpected file writes.
| _SAVE_UNMATCHED_OUTPUTS = True # whether to save unmatched outputs | |
| _SAVE_UNMATCHED_OUTPUTS = False # whether to save unmatched outputs |
| zero_dates_func=None): | ||
| """Helper function to compare two text files for exact match.""" | ||
|
|
||
| filename = filename if not filename else f"{filename} " |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ternary logic is inverted. When filename is truthy (not None/empty), it should be formatted with a space; when falsy, it should be an empty string. The current code does the opposite: it returns filename when it's falsy and formats it when truthy. Should be: filename = f"{filename} " if filename else ""
| filename = filename if not filename else f"{filename} " | |
| filename = f"{filename} " if filename else "" |
Pull Request Test Coverage Report for Build 19121750241Details
💛 - Coveralls |
This is a big one that adds unit tests for the most frequently used notebooks (metaG, metaT, and amplicon). Basically
notebook_test_helpersnow defines a small framework for injecting variable settings and running notebooks programmatically using papermill, and then checking if the output files they create are as expected. Note that this is hardly comprehensive--for example, it is not checking that the notebook ipynb files themselves have the expected contents when run (which would be useful for checking e.g. the inline plots they create) nor is it checking to make sure that the notebooks aren't ALSO creating some other unexpected files.In order for this to work, I had to comment out the inline cells scattered throughout the notebooks that set the default inputs as examples (because otherwise they overwrite the test settings injected into the first cell of the notebook by papermill). I marked each with
## INPUTto make them easy to see and moved them so they are not sharing cells with non-input code. They are set up so a real user can uncomment the entire cell with a keyboard shortcut and then modify the input setting for their specific run (which they MUST DO ANYWAY and it is a PEBKAC when they do not). If they fail to do this and try to keep running the notebook anyway, they will get an immediate and fairly helpful error from Jupyter saying that they are trying to use a variable that isn't defined.I also had to deal with a few input variables that were being defined and then redefined again later in the notebooks, as this is not compatible with papermill's "set all the variables all at once at the top" strategy. To get around this, I gave all input variables distinct names. I also fixed some bugs in the file path checking code in the metaT notebook.