-
Notifications
You must be signed in to change notification settings - Fork 8
Add notebook validation and execution workflows #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| name: Execute Notebooks | ||
|
|
||
| on: | ||
| pull_request: | ||
| types: [opened, synchronize, reopened] | ||
| paths: | ||
| - "notebooks/**/*.ipynb" | ||
| - ".github/workflows/execute-notebooks.yml" | ||
| push: | ||
| branches: [ main ] | ||
| paths: | ||
| - "notebooks/**/*.ipynb" | ||
| - ".github/workflows/execute-notebooks.yml" | ||
| # Allow manual triggering | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| execute_tests: | ||
| runs-on: ubuntu-latest | ||
| strategy: | ||
| matrix: | ||
| # Set the notebooks to execute | ||
| notebook_to_execute: ["notebooks/use-cases/document-conversion-standard.ipynb"] | ||
|
|
||
| # Set the files use in each notebook execution | ||
| file_to_use: ["https://raw.githubusercontent.com/py-pdf/sample-files/refs/heads/main/001-trivial/minimal-document.pdf"] | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.12" | ||
| cache: pip | ||
|
|
||
| - name: Install Testing Tools | ||
| run: | | ||
| pip install papermill ipykernel | ||
| ipython kernel install --name "python3" --user | ||
|
|
||
| - name: Execute Notebooks | ||
| run: | | ||
| set -ux | ||
|
|
||
| NOTEBOOK="${{ matrix.notebook_to_execute }}" | ||
| FILE="${{ matrix.file_to_use }}" | ||
|
|
||
| echo "Executing notebook '$NOTEBOOK' with file '$FILE'..." | ||
|
|
||
| papermill $NOTEBOOK $NOTEBOOK.tmp.ipynb -b $(echo -n "files: [\"$FILE\"]" | base64 -w 0) | ||
|
|
||
| echo "✓ Notebook $NOTEBOOK executed successfully" | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| name: Validate Notebooks | ||
|
|
||
| on: | ||
| pull_request: | ||
| types: [opened, synchronize, reopened] | ||
| paths: | ||
| - "notebooks/**/*.ipynb" | ||
| - ".github/workflows/validate-notebooks.yml" | ||
| push: | ||
| branches: [ main ] | ||
| paths: | ||
| - "notebooks/**/*.ipynb" | ||
| - ".github/workflows/validate-notebooks.yml" | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| validate_tests: | ||
| runs-on: ubuntu-latest | ||
| strategy: | ||
| matrix: | ||
| # Set the notebooks to validate, wildcards are allowed | ||
| notebooks_to_validate: ["notebooks/**/*.ipynb"] | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.12" | ||
| cache: pip | ||
|
|
||
| - name: Install Testing Tools | ||
| run: | | ||
| pip install -e .[test] | ||
| ipython kernel install --name "python3" --user | ||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| - name: Run Formatting Tests | ||
| run: make format-notebooks-check | ||
|
|
||
| - name: Validate Notebook Parameters | ||
| run: make test-notebook-parameters | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,5 @@ | ||
| ruff | ||
| nbstripout | ||
| pytest | ||
| nbformat | ||
| papermill |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| # Tests | ||
|
|
||
| This directory contains automated tests for the ODH Data Processing project. | ||
|
|
||
| ## Overview | ||
|
|
||
| Tests validate project components including notebooks, Python modules, and configurations to ensure they work correctly in development and CI/CD environments. | ||
|
|
||
| ## Current Tests | ||
|
|
||
| - **`test_notebook_parameters.py`** - Validates notebooks have required parameters cells for papermill execution | ||
| - **`conftest.py`** - Shared test configuration and utilities | ||
|
|
||
| ## Running Tests | ||
|
|
||
| ```bash | ||
| # Run all tests | ||
| pytest tests/ -v | ||
|
|
||
| # Run specific test file | ||
| pytest tests/test_*.py -v | ||
|
|
||
| # Run via Makefile (where available) | ||
| make test-notebook-parameters | ||
| ``` | ||
|
|
||
| Tests also run automatically in CI/CD via GitHub Actions workflows. | ||
|
|
||
| ## Setup | ||
|
|
||
| Install dependencies: | ||
| ```bash | ||
| pip install -r requirements-dev.txt | ||
| ``` | ||
|
|
||
| ## Adding New Tests | ||
|
|
||
| 1. Create new test files following `test_*.py` naming convention | ||
| 2. Add shared utilities to `conftest.py` if needed | ||
| 3. Update this README to document new test categories | ||
| 4. Add Makefile targets for convenient test execution | ||
|
|
||
| ## Configuration | ||
|
|
||
| Test configuration is in `pyproject.toml`: | ||
| ```toml | ||
| [tool.pytest.ini_options] | ||
| testpaths = ["tests"] | ||
| python_files = ["test_*.py"] | ||
| addopts = ["--tb=short", "-v"] | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| Common issues: | ||
| - **Test discovery**: Run from project root where `pyproject.toml` exists | ||
| - **Import errors**: Install dependencies with `pip install -r requirements-dev.txt` | ||
| - **Test failures**: Check error messages for specific validation requirements |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| """ | ||
| Shared pytest configuration and fixtures for notebook testing. | ||
| """ | ||
|
|
||
| import glob | ||
| from pathlib import Path | ||
|
|
||
| import pytest | ||
|
|
||
|
|
||
| def get_notebook_files(): | ||
| """Discover all notebook files in the notebooks directory.""" | ||
| notebook_pattern = "notebooks/**/*.ipynb" | ||
| notebook_files = glob.glob(notebook_pattern, recursive=True) | ||
|
|
||
| # Convert to Path objects and filter out any non-existent files | ||
| notebook_paths = [Path(f) for f in notebook_files if Path(f).exists()] | ||
|
|
||
| return notebook_paths | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def notebook_files(): | ||
| """Fixture that provides all notebook files for testing.""" | ||
| return get_notebook_files() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| """ | ||
| Test notebook parameters cell validation. | ||
|
|
||
| This module tests that all notebooks have the required parameters cell | ||
| that is needed for papermill execution. | ||
| """ | ||
|
|
||
| from pathlib import Path | ||
|
|
||
| import nbformat | ||
| import pytest | ||
|
|
||
| from conftest import get_notebook_files | ||
|
|
||
|
|
||
| class NotebookParametersValidator: | ||
| """Validator for notebook parameters cells.""" | ||
|
|
||
| def validate_parameters_cell(self, notebook_path: Path) -> bool: | ||
| """ | ||
| Validate that a notebook has at least one code cell tagged with 'parameters'. | ||
|
|
||
| Args: | ||
| notebook_path: Path to the notebook file | ||
|
|
||
| Returns: | ||
| True if notebook has parameters cell, False otherwise | ||
|
|
||
| Raises: | ||
| Exception: If notebook cannot be read or validated | ||
| """ | ||
| try: | ||
| # Read notebook with no conversion to preserve original structure | ||
| notebook = nbformat.read(notebook_path, nbformat.NO_CONVERT) | ||
|
|
||
| # Validate the notebook format | ||
| nbformat.validate(notebook) | ||
|
|
||
| # Check for parameters cell | ||
| has_parameters_cell = False | ||
|
|
||
| for cell in notebook.cells: | ||
| if cell.cell_type == 'code': | ||
| # Check for code cells tagged with 'parameters' | ||
| if ('tags' in cell.metadata and | ||
| 'parameters' in cell.metadata.tags): | ||
| has_parameters_cell = True | ||
| break | ||
|
|
||
| return has_parameters_cell | ||
|
|
||
| except Exception as e: | ||
| raise Exception(f"Failed to validate notebook {notebook_path}: {str(e)}") from e | ||
|
|
||
|
|
||
| @pytest.mark.parametrize("notebook_path", get_notebook_files()) | ||
| def test_notebook_has_parameters_cell(notebook_path): | ||
| """ | ||
| Test that each notebook has at least one code cell tagged with 'parameters'. | ||
|
|
||
| This is required for papermill execution in the CI/CD pipeline. | ||
| """ | ||
| validator = NotebookParametersValidator() | ||
|
|
||
| has_parameters = validator.validate_parameters_cell(notebook_path) | ||
|
|
||
| assert has_parameters, ( | ||
| f"Notebook '{notebook_path}' does not have any code cell tagged with 'parameters'. " | ||
| f"Please add a code cell with metadata tag 'parameters' for papermill execution." | ||
| ) | ||
|
|
||
|
|
||
| def test_validator_itself(): | ||
| """Test the validator logic with a mock notebook structure.""" | ||
| # This tests the validator class itself to ensure it works correctly | ||
| validator = NotebookParametersValidator() | ||
|
|
||
| # Create a simple test notebook structure | ||
| test_notebook = nbformat.v4.new_notebook() | ||
|
|
||
| # Add a regular code cell | ||
| code_cell = nbformat.v4.new_code_cell("x = 1") | ||
| test_notebook.cells.append(code_cell) | ||
|
|
||
| # Should fail - no parameters cell yet | ||
| test_path = Path("test_notebook_no_params.ipynb") | ||
| with open(test_path, 'w') as f: | ||
| nbformat.write(test_notebook, f) | ||
|
|
||
| try: | ||
| assert not validator.validate_parameters_cell(test_path) | ||
| finally: | ||
| test_path.unlink() # Clean up | ||
|
|
||
| # Add a parameters cell | ||
| params_cell = nbformat.v4.new_code_cell("# Parameters cell\nfiles = []") | ||
| params_cell.metadata["tags"] = ["parameters"] | ||
| test_notebook.cells.append(params_cell) | ||
|
|
||
| # Should pass - has parameters cell | ||
| with open(test_path, 'w') as f: | ||
| nbformat.write(test_notebook, f) | ||
|
|
||
| try: | ||
| assert validator.validate_parameters_cell(test_path) | ||
| finally: | ||
| test_path.unlink() # Clean up |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.