Add notebook validation and execution workflows #41

shruthis4 · 2025-11-04T19:26:42Z

Introduced GitHub Actions workflows for executing and validating Jupyter notebooks.
Added a new Makefile target for testing notebook parameters.
Implemented pytest-based tests to ensure notebooks contain required parameters cells for papermill execution.
Updated pyproject.toml for pytest configuration and added necessary development dependencies in requirements-dev.txt.
Created shared test utilities in conftest.py and documented tests in tests/README.md.

Description

How Has This Been Tested?

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

Chores
- Added CI workflows to execute and validate notebooks on PRs, pushes, and manual runs.
- Added development test tooling and dependencies for notebook execution and validation.
- Added a make target to run notebook-parameter tests.
Tests
- Added automated tests and fixtures that verify notebooks include required parameter cells.
Documentation
- Added a tests guide describing notebook tests and how to run them.

- Introduced GitHub Actions workflows for executing and validating Jupyter notebooks. - Added a new Makefile target for testing notebook parameters. - Implemented pytest-based tests to ensure notebooks contain required parameters cells for papermill execution. - Updated `pyproject.toml` for pytest configuration and added necessary development dependencies in `requirements-dev.txt`. - Created shared test utilities in `conftest.py` and documented tests in `tests/README.md`.

coderabbitai · 2025-11-04T19:26:51Z

Walkthrough

Adds CI workflows to validate and execute Jupyter notebooks, new pytest tests and fixtures to ensure notebooks contain a code cell tagged "parameters", updates Makefile and dev requirements, and adds test documentation.

Changes

Cohort / File(s)	Summary
CI/CD Workflows `\.github/workflows/execute-notebooks.yml`, `\.github/workflows/validate-notebooks.yml`	New GitHub Actions workflows: `validate-notebooks.yml` runs formatting and notebook-parameter checks on notebook changes; `execute-notebooks.yml` runs notebooks via papermill (injects file URL/payload). Both use Python 3.12, pip caching, and trigger on PRs, pushes to main, and manual dispatch with path filters.
Build & Dev config `Makefile`, `pyproject.toml`, `requirements-dev.txt`	Makefile: add `test-notebook-parameters` target and include in `.PHONY`; pyproject.toml: add `[tool.pytest.ini_options]` settings (testpaths, python_files, addopts); requirements-dev.txt: add `pytest`, `nbformat`, `papermill`.
Test infrastructure & docs `tests/conftest.py`, `tests/test_notebook_parameters.py`, `tests/README.md`	Add test helper to discover `notebooks/*/.ipynb` and a `notebook_files` fixture; add `NotebookParametersValidator` and tests that assert each notebook has a code cell tagged `parameters` plus a self-test; add tests/README.md describing tests and CI usage.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Dev as Developer (PR / Push)
  participant GH as GitHub Actions
  participant Runner as Runner (ubuntu-latest)
  participant Repo as Repository
  participant Py as Python 3.12
  participant Pap as Papermill

  Dev->>GH: push PR / push to main (paths: notebooks/**.ipynb or workflows)
  GH->>Runner: start validate-notebooks job
  Runner->>Repo: checkout code
  Runner->>Py: setup Python 3.12 + pip cache
  Runner->>Py: pip install pytest, nbformat, nbstripout, ipykernel, jupyter-client
  Runner->>Runner: make format-notebooks-check
  Runner->>Runner: make test-notebook-parameters (pytest runs NotebookParametersValidator)
  alt execute-notebooks triggered (matrix per notebook)
    GH->>Runner: start execute-notebooks job (matrix item)
    Runner->>Py: pip install papermill, ipykernel
    Runner->>Pap: papermill executes notebook (inject base64 payload)
    Pap-->>Runner: execution logs / status
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Review focus:
- .github/workflows/*.yml: verify path filters, matrix entries, and papermill payload handling.
- tests/test_notebook_parameters.py: correctness of notebook parsing and parameters-tag detection, error messages.
- tests/conftest.py: glob discovery and filtering of non-existent paths.
- Makefile and pyproject.toml: ensure test targets and pytest settings align with repo layout.

Poem

🐰 I hopped through cells with eager cheer,
Tagging "parameters" so CI draws near.
Papermill warms each notebook bright,
Tests validate by morning light.
A little rabbit applauds the automated sight.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: adding GitHub Actions workflows for notebook validation and execution, which is the primary focus of this PR.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b839e34 and e003307.

📒 Files selected for processing (1)

.github/workflows/validate-notebooks.yml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/validate-notebooks.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: execute_tests (notebooks/use-cases/document-conversion-standard.ipynb, https://raw.githubusercont...

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

requirements-dev.txt (1)

3-5: Consider pinning dependency versions for reproducibility.

The new test dependencies lack version pins, which may cause non-reproducible builds or unexpected failures when upstream releases introduce breaking changes.

Apply this diff to pin versions:
-pytest
-nbformat
-papermill
+pytest>=8.0.0,<9.0.0
+nbformat>=5.9.0,<6.0.0
+papermill>=2.4.0,<3.0.0
Alternatively, you can verify and pin to specific current versions:
#!/bin/bash
# Check latest stable versions of the dependencies
pip index versions pytest | head -5
pip index versions nbformat | head -5
pip index versions papermill | head -5

.github/workflows/execute-notebooks.yml (1)

44-55: Clean up temporary output file after execution.

The temporary output file $NOTEBOOK.tmp.ipynb created by papermill is never cleaned up or used for verification.

Apply this diff to add cleanup:
           papermill $NOTEBOOK $NOTEBOOK.tmp.ipynb -b $(echo -n "files: [\"$FILE\"]" | base64 -w 0)
+          
+          # Clean up temporary output
+          rm -f $NOTEBOOK.tmp.ipynb
 
           echo "✓ Notebook $NOTEBOOK executed successfully"
Alternatively, consider using /tmp for temporary files or verifying the output notebook's execution results before cleanup.

tests/conftest.py (1)

11-19: Remove redundant existence check.

Line 17 filters out non-existent files, but glob.glob() only returns paths that exist, making this check unnecessary.

Apply this diff to simplify:

 def get_notebook_files():
     """Discover all notebook files in the notebooks directory."""
     notebook_pattern = "notebooks/**/*.ipynb"
     notebook_files = glob.glob(notebook_pattern, recursive=True)
     
-    # Convert to Path objects and filter out any non-existent files
-    notebook_paths = [Path(f) for f in notebook_files if Path(f).exists()]
+    # Convert to Path objects
+    notebook_paths = [Path(f) for f in notebook_files]
     
     return notebook_paths

tests/test_notebook_parameters.py (2)

16-53: Consider simplifying to a standalone function.

The NotebookParametersValidator class has only one public method. Per the Single Responsibility Principle and Pylint's guidance, consider refactoring to a simple function unless you anticipate adding more validation methods.

Apply this diff:

-class NotebookParametersValidator:
-    """Validator for notebook parameters cells."""
-    
-    def validate_parameters_cell(self, notebook_path: Path) -> bool:
-        """
-        Validate that a notebook has at least one code cell tagged with 'parameters'.
-        
-        Args:
-            notebook_path: Path to the notebook file
-            
-        Returns:
-            True if notebook has parameters cell, False otherwise
-            
-        Raises:
-            Exception: If notebook cannot be read or validated
-        """
+def validate_notebook_has_parameters_cell(notebook_path: Path) -> bool:
+    """
+    Validate that a notebook has at least one code cell tagged with 'parameters'.
+    
+    Args:
+        notebook_path: Path to the notebook file
+        
+    Returns:
+        True if notebook has parameters cell, False otherwise
+        
+    Raises:
+        Exception: If notebook cannot be read or validated
+    """
-        try:
-            # Read notebook with no conversion to preserve original structure
-            notebook = nbformat.read(notebook_path, nbformat.NO_CONVERT)
-            
-            # Validate the notebook format
-            nbformat.validate(notebook)
-            
-            # Check for parameters cell
-            has_parameters_cell = False
-            
-            for cell in notebook.cells:
-                if cell.cell_type == 'code':
-                    # Check for code cells tagged with 'parameters'
-                    if ('tags' in cell.metadata and 
-                        'parameters' in cell.metadata.tags):
-                        has_parameters_cell = True
-                        break
-            
-            return has_parameters_cell
-            
-        except Exception as e:
-            raise Exception(f"Failed to validate notebook {notebook_path}: {str(e)}") from e
+    try:
+        # Read notebook with no conversion to preserve original structure
+        notebook = nbformat.read(notebook_path, nbformat.NO_CONVERT)
+        
+        # Validate the notebook format
+        nbformat.validate(notebook)
+        
+        # Check for parameters cell
+        has_parameters_cell = False
+        
+        for cell in notebook.cells:
+            if cell.cell_type == 'code':
+                # Check for code cells tagged with 'parameters'
+                if ('tags' in cell.metadata and 
+                    'parameters' in cell.metadata.tags):
+                    has_parameters_cell = True
+                    break
+        
+        return has_parameters_cell
+        
+    except Exception as e:
+        raise Exception(f"Failed to validate notebook {notebook_path}: {str(e)}") from e

Then update the test calls accordingly.

73-107: Use temporary directory for test artifacts.

The test creates temporary notebook files in the current directory (line 86, 101), which risks conflicts and may leave debris if the test fails unexpectedly. Additionally, file operations lack explicit encoding specifications.

Apply this diff to use pytest's tmp_path fixture:

-def test_validator_itself():
+def test_validator_itself(tmp_path):
     """Test the validator logic with a mock notebook structure."""
     # This tests the validator class itself to ensure it works correctly
     validator = NotebookParametersValidator()
     
     # Create a simple test notebook structure
     test_notebook = nbformat.v4.new_notebook()
     
     # Add a regular code cell
     code_cell = nbformat.v4.new_code_cell("x = 1")
     test_notebook.cells.append(code_cell)
     
     # Should fail - no parameters cell yet
-    test_path = Path("test_notebook_no_params.ipynb")
-    with open(test_path, 'w') as f:
+    test_path = tmp_path / "test_notebook_no_params.ipynb"
+    with open(test_path, 'w', encoding='utf-8') as f:
         nbformat.write(test_notebook, f)
     
-    try:
-        assert not validator.validate_parameters_cell(test_path)
-    finally:
-        test_path.unlink()  # Clean up
+    assert not validator.validate_parameters_cell(test_path)
     
     # Add a parameters cell
     params_cell = nbformat.v4.new_code_cell("# Parameters cell\nfiles = []")
     params_cell.metadata["tags"] = ["parameters"]
     test_notebook.cells.append(params_cell)
     
     # Should pass - has parameters cell
-    with open(test_path, 'w') as f:
+    test_path = tmp_path / "test_notebook_with_params.ipynb"
+    with open(test_path, 'w', encoding='utf-8') as f:
         nbformat.write(test_notebook, f)
     
-    try:
-        assert validator.validate_parameters_cell(test_path)
-    finally:
-        test_path.unlink()  # Clean up
+    assert validator.validate_parameters_cell(test_path)

Note: tmp_path is automatically cleaned up by pytest after the test completes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 277fcac and 0500a23.

📒 Files selected for processing (8)

.github/workflows/execute-notebooks.yml (1 hunks)
.github/workflows/validate-notebooks.yml (1 hunks)
Makefile (2 hunks)
pyproject.toml (1 hunks)
requirements-dev.txt (1 hunks)
tests/README.md (1 hunks)
tests/conftest.py (1 hunks)
tests/test_notebook_parameters.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/test_notebook_parameters.py (1)

tests/conftest.py (1)

get_notebook_files (11-19)

🪛 checkmake (0.2.2)

Makefile

[warning] 1-1: Missing required phony target "all"

(minphony)

[warning] 1-1: Missing required phony target "clean"

(minphony)

[warning] 1-1: Missing required phony target "test"

(minphony)

🪛 Pylint (4.0.2)

tests/test_notebook_parameters.py

[refactor] 16-16: Too few public methods (1/2)

(R0903)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: execute_tests (notebooks/use-cases/document-conversion-standard.ipynb, https://raw.githubusercont...

🔇 Additional comments (4)

tests/README.md (1)

1-58: LGTM!

The documentation is comprehensive and accurately describes the testing infrastructure, including setup instructions, test execution commands, and troubleshooting guidance.

pyproject.toml (1)

29-32: LGTM!

The pytest configuration is appropriate with sensible defaults for test discovery and output formatting.

Makefile (1)

28-31: LGTM!

The Makefile target correctly executes the notebook parameter tests with appropriate messaging.

tests/test_notebook_parameters.py (1)

56-70: The concern about silent test skipping does not apply to this codebase.

The get_notebook_files() function successfully discovers all 5 notebooks in the repository. The glob pattern correctly identifies notebooks at notebooks/tutorials/ and notebooks/use-cases/, and the parametrized test will execute with 5 test instances—not be silently skipped.

Likely an incorrect or invalid review comment.

.github/workflows/validate-notebooks.yml

…dencies - Modified the installation step in the GitHub Actions workflow to directly install pytest, nbformat, and nbstripout instead of using the editable install with test dependencies.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

.github/workflows/validate-notebooks.yml (2)
21-24: Remove unused matrix variable.

The notebooks_to_validate matrix contains a single value and is never referenced in the workflow steps. This unnecessary complexity should be removed.

Apply this diff to remove the unused matrix:
     strategy:
-      matrix:
-        # Set the notebooks to validate, wildcards are allowed
-        notebooks_to_validate: ["notebooks/**/*.ipynb"]
     steps:
34-37: Use requirements-dev.txt for consistency and maintainability.

The workflow hardcodes package names instead of using the requirements-dev.txt file mentioned in the PR summary. This approach was recommended in the previous review (Option 1) and ensures all environments (local dev, CI, documentation) depend on the same versions.

Additionally, verify that papermill is included in requirements-dev.txt, as the make test-notebook-parameters target likely requires it for parameter validation.

Apply this diff:
       - name: Install Testing Tools
         run: |
-          pip install pytest nbformat nbstripout
+          pip install -r requirements-dev.txt
           ipython kernel install --name "python3" --user

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0500a23 and b839e34.

📒 Files selected for processing (1)

.github/workflows/validate-notebooks.yml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: execute_tests (notebooks/use-cases/document-conversion-standard.ipynb, https://raw.githubusercont...

…endencies

.github/workflows/execute-notebooks.yml

shruthis4 requested a review from alimaredia November 4, 2025 19:26

shruthis4 requested a review from a team as a code owner November 4, 2025 19:26

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

.github/workflows/validate-notebooks.yml Show resolved Hide resolved

Update notebook validation workflow to install specific testing depen…

b839e34

…dencies - Modified the installation step in the GitHub Actions workflow to directly install pytest, nbformat, and nbstripout instead of using the editable install with test dependencies.

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

Enhance notebook validation workflow by adding additional testing dep…

e003307

…endencies

alimaredia reviewed Nov 5, 2025

View reviewed changes

.github/workflows/execute-notebooks.yml Show resolved Hide resolved

alimaredia approved these changes Nov 5, 2025

View reviewed changes

alimaredia merged commit 77ffed9 into opendatahub-io:main Nov 5, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add notebook validation and execution workflows #41

Add notebook validation and execution workflows #41

Uh oh!

shruthis4 commented Nov 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add notebook validation and execution workflows #41

Add notebook validation and execution workflows #41

Uh oh!

Conversation

shruthis4 commented Nov 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shruthis4 commented Nov 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 4, 2025 •

edited

Loading