Skip to content

Conversation

@AmandaBirmingham
Copy link
Collaborator

@AmandaBirmingham AmandaBirmingham commented Nov 6, 2025

This is a big one that adds unit tests for the most frequently used notebooks (metaG, metaT, and amplicon). Basically notebook_test_helpers now defines a small framework for injecting variable settings and running notebooks programmatically using papermill, and then checking if the output files they create are as expected. Note that this is hardly comprehensive--for example, it is not checking that the notebook ipynb files themselves have the expected contents when run (which would be useful for checking e.g. the inline plots they create) nor is it checking to make sure that the notebooks aren't ALSO creating some other unexpected files.

In order for this to work, I had to comment out the inline cells scattered throughout the notebooks that set the default inputs as examples (because otherwise they overwrite the test settings injected into the first cell of the notebook by papermill). I marked each with ## INPUT to make them easy to see and moved them so they are not sharing cells with non-input code. They are set up so a real user can uncomment the entire cell with a keyboard shortcut and then modify the input setting for their specific run (which they MUST DO ANYWAY and it is a PEBKAC when they do not). If they fail to do this and try to keep running the notebook anyway, they will get an immediate and fairly helpful error from Jupyter saying that they are trying to use a variable that isn't defined.

I also had to deal with a few input variables that were being defined and then redefined again later in the notebooks, as this is not compatible with papermill's "set all the variables all at once at the top" strategy. To get around this, I gave all input variables distinct names. I also fixed some bugs in the file path checking code in the metaT notebook.

@AmandaBirmingham AmandaBirmingham changed the title Pacbio unittests Notebook unittests Nov 6, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors notebook testing infrastructure by introducing a base TestNotebook class and renaming test classes for consistency. The changes include:

  • Creating a new helper base class (TestNotebook) for notebook testing
  • Renaming test classes to follow a consistent naming pattern with "Notebook" suffix
  • Adding three new comprehensive test files for different notebook pipelines
  • Updating an output test data file with additional columns

Reviewed Changes

Copilot reviewed 10 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
notebooks/tests/test_tellseq_D_variable_volume_pooling.py Renamed test class from TestTellseqD to TestTellseqDNotebook and test method for clarity
notebooks/tests/test_metatranscriptomics_matrix_pipeline_seqcount_norm.py New test file for metatranscriptomics pipeline with comprehensive parameter setup
notebooks/tests/test_matrix_tube_pipeline_seqcount_norm.py New test file for matrix tube pipeline with both standard and absquant test cases
notebooks/tests/test_amplicon_pre_prep_file_generator.py New test file for amplicon preprocessing with detailed compression layout configuration
notebooks/tests/notebook_test_helpers.py New base class providing common test infrastructure for notebook validation
notebooks/test_output/amplicon/20230203_IL515fBC_806_ABTX_11052_Plates_174_178_182_185_ADAPT_12986_Plate_16_17_18_21_merged.txt Updated test output file with new sample_name column and additional location columns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

import os
import re

SAVE_DIR = "/Users/amandabirmingham/Desktop"
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded user-specific path should not be committed to the repository. This path is specific to a single developer's machine and will not work for other contributors or in CI/CD environments. Consider using a temporary directory or making this configurable through environment variables.

Suggested change
SAVE_DIR = "/Users/amandabirmingham/Desktop"
SAVE_DIR = os.environ.get("UNMATCHED_OUTPUT_SAVE_DIR", tempfile.gettempdir())

Copilot uses AI. Check for mistakes.
_ZERO_DATES_FUNC_KEY = "zero_dates_func" # func to replace for dates

# TODO: turn off before committing
_SAVE_UNMATCHED_OUTPUTS = True # whether to save unmatched outputs
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment indicates this debug flag should be set to False before committing. This flag enables saving unmatched output files to the hardcoded SAVE_DIR, which should be disabled in production code to avoid unexpected file writes.

Suggested change
_SAVE_UNMATCHED_OUTPUTS = True # whether to save unmatched outputs
_SAVE_UNMATCHED_OUTPUTS = False # whether to save unmatched outputs

Copilot uses AI. Check for mistakes.
zero_dates_func=None):
"""Helper function to compare two text files for exact match."""

filename = filename if not filename else f"{filename} "
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ternary logic is inverted. When filename is truthy (not None/empty), it should be formatted with a space; when falsy, it should be an empty string. The current code does the opposite: it returns filename when it's falsy and formats it when truthy. Should be: filename = f"{filename} " if filename else ""

Suggested change
filename = filename if not filename else f"{filename} "
filename = f"{filename} " if filename else ""

Copilot uses AI. Check for mistakes.
@coveralls
Copy link

coveralls commented Nov 6, 2025

Pull Request Test Coverage Report for Build 19121750241

Details

  • 91 of 101 (90.1%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 92.687%

Changes Missing Coverage Covered Lines Changed/Added Lines %
notebooks/tests/notebook_test_helpers.py 54 64 84.38%
Totals Coverage Status
Change from base Build 19117964099: 0.1%
Covered Lines: 6477
Relevant Lines: 6988

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants