-
Notifications
You must be signed in to change notification settings - Fork 8
Add notebook validation and execution workflows #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced GitHub Actions workflows for executing and validating Jupyter notebooks. - Added a new Makefile target for testing notebook parameters. - Implemented pytest-based tests to ensure notebooks contain required parameters cells for papermill execution. - Updated `pyproject.toml` for pytest configuration and added necessary development dependencies in `requirements-dev.txt`. - Created shared test utilities in `conftest.py` and documented tests in `tests/README.md`.
WalkthroughAdds CI workflows to validate and execute Jupyter notebooks, new pytest tests and fixtures to ensure notebooks contain a code cell tagged "parameters", updates Makefile and dev requirements, and adds test documentation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dev as Developer (PR / Push)
participant GH as GitHub Actions
participant Runner as Runner (ubuntu-latest)
participant Repo as Repository
participant Py as Python 3.12
participant Pap as Papermill
Dev->>GH: push PR / push to main (paths: notebooks/**.ipynb or workflows)
GH->>Runner: start validate-notebooks job
Runner->>Repo: checkout code
Runner->>Py: setup Python 3.12 + pip cache
Runner->>Py: pip install pytest, nbformat, nbstripout, ipykernel, jupyter-client
Runner->>Runner: make format-notebooks-check
Runner->>Runner: make test-notebook-parameters (pytest runs NotebookParametersValidator)
alt execute-notebooks triggered (matrix per notebook)
GH->>Runner: start execute-notebooks job (matrix item)
Runner->>Py: pip install papermill, ipykernel
Runner->>Pap: papermill executes notebook (inject base64 payload)
Pap-->>Runner: execution logs / status
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
requirements-dev.txt (1)
3-5: Consider pinning dependency versions for reproducibility.The new test dependencies lack version pins, which may cause non-reproducible builds or unexpected failures when upstream releases introduce breaking changes.
Apply this diff to pin versions:
-pytest -nbformat -papermill +pytest>=8.0.0,<9.0.0 +nbformat>=5.9.0,<6.0.0 +papermill>=2.4.0,<3.0.0Alternatively, you can verify and pin to specific current versions:
#!/bin/bash # Check latest stable versions of the dependencies pip index versions pytest | head -5 pip index versions nbformat | head -5 pip index versions papermill | head -5.github/workflows/execute-notebooks.yml (1)
44-55: Clean up temporary output file after execution.The temporary output file
$NOTEBOOK.tmp.ipynbcreated by papermill is never cleaned up or used for verification.Apply this diff to add cleanup:
papermill $NOTEBOOK $NOTEBOOK.tmp.ipynb -b $(echo -n "files: [\"$FILE\"]" | base64 -w 0) + + # Clean up temporary output + rm -f $NOTEBOOK.tmp.ipynb echo "✓ Notebook $NOTEBOOK executed successfully"Alternatively, consider using
/tmpfor temporary files or verifying the output notebook's execution results before cleanup.tests/conftest.py (1)
11-19: Remove redundant existence check.Line 17 filters out non-existent files, but
glob.glob()only returns paths that exist, making this check unnecessary.Apply this diff to simplify:
def get_notebook_files(): """Discover all notebook files in the notebooks directory.""" notebook_pattern = "notebooks/**/*.ipynb" notebook_files = glob.glob(notebook_pattern, recursive=True) - # Convert to Path objects and filter out any non-existent files - notebook_paths = [Path(f) for f in notebook_files if Path(f).exists()] + # Convert to Path objects + notebook_paths = [Path(f) for f in notebook_files] return notebook_pathstests/test_notebook_parameters.py (2)
16-53: Consider simplifying to a standalone function.The
NotebookParametersValidatorclass has only one public method. Per the Single Responsibility Principle and Pylint's guidance, consider refactoring to a simple function unless you anticipate adding more validation methods.Apply this diff:
-class NotebookParametersValidator: - """Validator for notebook parameters cells.""" - - def validate_parameters_cell(self, notebook_path: Path) -> bool: - """ - Validate that a notebook has at least one code cell tagged with 'parameters'. - - Args: - notebook_path: Path to the notebook file - - Returns: - True if notebook has parameters cell, False otherwise - - Raises: - Exception: If notebook cannot be read or validated - """ +def validate_notebook_has_parameters_cell(notebook_path: Path) -> bool: + """ + Validate that a notebook has at least one code cell tagged with 'parameters'. + + Args: + notebook_path: Path to the notebook file + + Returns: + True if notebook has parameters cell, False otherwise + + Raises: + Exception: If notebook cannot be read or validated + """ - try: - # Read notebook with no conversion to preserve original structure - notebook = nbformat.read(notebook_path, nbformat.NO_CONVERT) - - # Validate the notebook format - nbformat.validate(notebook) - - # Check for parameters cell - has_parameters_cell = False - - for cell in notebook.cells: - if cell.cell_type == 'code': - # Check for code cells tagged with 'parameters' - if ('tags' in cell.metadata and - 'parameters' in cell.metadata.tags): - has_parameters_cell = True - break - - return has_parameters_cell - - except Exception as e: - raise Exception(f"Failed to validate notebook {notebook_path}: {str(e)}") from e + try: + # Read notebook with no conversion to preserve original structure + notebook = nbformat.read(notebook_path, nbformat.NO_CONVERT) + + # Validate the notebook format + nbformat.validate(notebook) + + # Check for parameters cell + has_parameters_cell = False + + for cell in notebook.cells: + if cell.cell_type == 'code': + # Check for code cells tagged with 'parameters' + if ('tags' in cell.metadata and + 'parameters' in cell.metadata.tags): + has_parameters_cell = True + break + + return has_parameters_cell + + except Exception as e: + raise Exception(f"Failed to validate notebook {notebook_path}: {str(e)}") from eThen update the test calls accordingly.
73-107: Use temporary directory for test artifacts.The test creates temporary notebook files in the current directory (line 86, 101), which risks conflicts and may leave debris if the test fails unexpectedly. Additionally, file operations lack explicit encoding specifications.
Apply this diff to use pytest's
tmp_pathfixture:-def test_validator_itself(): +def test_validator_itself(tmp_path): """Test the validator logic with a mock notebook structure.""" # This tests the validator class itself to ensure it works correctly validator = NotebookParametersValidator() # Create a simple test notebook structure test_notebook = nbformat.v4.new_notebook() # Add a regular code cell code_cell = nbformat.v4.new_code_cell("x = 1") test_notebook.cells.append(code_cell) # Should fail - no parameters cell yet - test_path = Path("test_notebook_no_params.ipynb") - with open(test_path, 'w') as f: + test_path = tmp_path / "test_notebook_no_params.ipynb" + with open(test_path, 'w', encoding='utf-8') as f: nbformat.write(test_notebook, f) - try: - assert not validator.validate_parameters_cell(test_path) - finally: - test_path.unlink() # Clean up + assert not validator.validate_parameters_cell(test_path) # Add a parameters cell params_cell = nbformat.v4.new_code_cell("# Parameters cell\nfiles = []") params_cell.metadata["tags"] = ["parameters"] test_notebook.cells.append(params_cell) # Should pass - has parameters cell - with open(test_path, 'w') as f: + test_path = tmp_path / "test_notebook_with_params.ipynb" + with open(test_path, 'w', encoding='utf-8') as f: nbformat.write(test_notebook, f) - try: - assert validator.validate_parameters_cell(test_path) - finally: - test_path.unlink() # Clean up + assert validator.validate_parameters_cell(test_path)Note:
tmp_pathis automatically cleaned up by pytest after the test completes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
.github/workflows/execute-notebooks.yml(1 hunks).github/workflows/validate-notebooks.yml(1 hunks)Makefile(2 hunks)pyproject.toml(1 hunks)requirements-dev.txt(1 hunks)tests/README.md(1 hunks)tests/conftest.py(1 hunks)tests/test_notebook_parameters.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_notebook_parameters.py (1)
tests/conftest.py (1)
get_notebook_files(11-19)
🪛 checkmake (0.2.2)
Makefile
[warning] 1-1: Missing required phony target "all"
(minphony)
[warning] 1-1: Missing required phony target "clean"
(minphony)
[warning] 1-1: Missing required phony target "test"
(minphony)
🪛 Pylint (4.0.2)
tests/test_notebook_parameters.py
[refactor] 16-16: Too few public methods (1/2)
(R0903)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: execute_tests (notebooks/use-cases/document-conversion-standard.ipynb, https://raw.githubusercont...
🔇 Additional comments (4)
tests/README.md (1)
1-58: LGTM!The documentation is comprehensive and accurately describes the testing infrastructure, including setup instructions, test execution commands, and troubleshooting guidance.
pyproject.toml (1)
29-32: LGTM!The pytest configuration is appropriate with sensible defaults for test discovery and output formatting.
Makefile (1)
28-31: LGTM!The Makefile target correctly executes the notebook parameter tests with appropriate messaging.
tests/test_notebook_parameters.py (1)
56-70: The concern about silent test skipping does not apply to this codebase.The
get_notebook_files()function successfully discovers all 5 notebooks in the repository. The glob pattern correctly identifies notebooks atnotebooks/tutorials/andnotebooks/use-cases/, and the parametrized test will execute with 5 test instances—not be silently skipped.Likely an incorrect or invalid review comment.
…dencies - Modified the installation step in the GitHub Actions workflow to directly install pytest, nbformat, and nbstripout instead of using the editable install with test dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
.github/workflows/validate-notebooks.yml (2)
21-24: Remove unused matrix variable.The
notebooks_to_validatematrix contains a single value and is never referenced in the workflow steps. This unnecessary complexity should be removed.Apply this diff to remove the unused matrix:
strategy: - matrix: - # Set the notebooks to validate, wildcards are allowed - notebooks_to_validate: ["notebooks/**/*.ipynb"] steps:
34-37: Userequirements-dev.txtfor consistency and maintainability.The workflow hardcodes package names instead of using the
requirements-dev.txtfile mentioned in the PR summary. This approach was recommended in the previous review (Option 1) and ensures all environments (local dev, CI, documentation) depend on the same versions.Additionally, verify that
papermillis included inrequirements-dev.txt, as themake test-notebook-parameterstarget likely requires it for parameter validation.Apply this diff:
- name: Install Testing Tools run: | - pip install pytest nbformat nbstripout + pip install -r requirements-dev.txt ipython kernel install --name "python3" --user
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/validate-notebooks.yml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: execute_tests (notebooks/use-cases/document-conversion-standard.ipynb, https://raw.githubusercont...
pyproject.tomlfor pytest configuration and added necessary development dependencies inrequirements-dev.txt.conftest.pyand documented tests intests/README.md.Description
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit
Chores
Tests
Documentation