Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/execute-all-notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ on:
description: "Pull request number or branch name"
required: true
default: "main"
pull_request:
types: [opened, synchronize, reopened]
paths:
- "notebooks/**/*.ipynb"
- ".github/workflows/execute-all-notebooks.yml"

env:
INSTANCE_TYPE: "g6e.xlarge"
Expand Down Expand Up @@ -70,6 +75,26 @@ jobs:
run: |
echo "hello. i'm running from inside of EC2 instance ${{ needs.launch-ec2-runner.outputs.ec2-instance-id }}"

- uses: actions/checkout@v5
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Checkout action version inconsistency.

Line 78 uses actions/checkout@v5, but line 40 uses actions/checkout@v4. Maintain consistency across the workflow by upgrading line 78 to @v5 or downgrading line 40 to @v4 (preferably keep both on the latest stable version).

🤖 Prompt for AI Agents
.github/workflows/execute-all-notebooks.yml around lines 40 and 78: the workflow
uses mixed checkout action versions (line 40 is actions/checkout@v4 while line
78 is actions/checkout@v5); make them consistent by updating the older reference
to the preferred stable version (recommend upgrading the @v4 usage at line 40 to
actions/checkout@v5) or alternatively downgrade line 78 to @v4—ensure both
checkout steps reference the same tag and run the workflow to verify no breaking
changes.


- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: "3.12"
cache: pip

- name: Install Dependencies
run: |
pip install -r requirements-dev.txt
pip install papermill ipykernel jupyter
ipython kernel install --name "python3" --user

Comment on lines +86 to +91
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Potential dependency version conflicts from dual installation.

Installing from requirements-dev.txt and then separately installing papermill ipykernel jupyter could lead to version conflicts if these packages are already specified in requirements-dev.txt. Consider consolidating all dependencies into requirements-dev.txt to ensure consistent versions.


🏁 Script executed:

#!/bin/bash
# Check if papermill, ipykernel, or jupyter are already in requirements-dev.txt

echo "=== Checking requirements-dev.txt for potential conflicts ==="
if [ -f requirements-dev.txt ]; then
  echo "File exists. Checking for papermill, ipykernel, jupyter..."
  grep -E "(papermill|ipykernel|jupyter)" requirements-dev.txt || echo "None found in requirements-dev.txt"
else
  echo "requirements-dev.txt not found"
fi

Length of output: 130


🏁 Script executed:

#!/bin/bash
# Get full requirements-dev.txt and check papermill line + context
echo "=== Full requirements-dev.txt content ==="
cat requirements-dev.txt

echo ""
echo "=== Workflow file lines 86-95 for context ==="
sed -n '86,95p' .github/workflows/execute-all-notebooks.yml

Length of output: 482


Remove redundant papermill installation to ensure consistent dependency resolution.

Papermill is already specified in requirements-dev.txt but is reinstalled on line 89 alongside ipykernel and jupyter. This dual installation can cause version conflicts if requirements-dev.txt pins a specific version of papermill. Instead, add ipykernel and jupyter to requirements-dev.txt and remove the redundant pip install papermill ipykernel jupyter line.

🤖 Prompt for AI Agents
In .github/workflows/execute-all-notebooks.yml around lines 86 to 91, the
workflow redundantly reinstalls papermill (via pip install papermill ipykernel
jupyter) even though papermill is already pinned in requirements-dev.txt; remove
the extra pip install line and instead add ipykernel and jupyter to
requirements-dev.txt, then keep only pip install -r requirements-dev.txt
followed by the ipython kernel install --name "python3" --user step so all
dependencies (including papermill, ipykernel, jupyter) are installed from the
single requirements file.

- name: Execute All Notebooks via Pytest
run: |
cd tests
python -m pytest test_notebook_execution.py -v --tb=short


stop-ec2-runner:
permissions:
id-token: write # This is required for OIDC (AWS auth)
Expand Down
15 changes: 14 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: format-python format-notebook format-python-check format-notebooks-check test-notebook-parameters
.PHONY: format-python format-notebook format-python-check format-notebooks-check test-notebook-parameters test-notebook-execution test-kfp-components test-notebooks test-all

USE_CASES := $(wildcard notebooks/use-cases/*.ipynb)
TUTORIALS := $(wildcard notebooks/tutorials/*.ipynb)
Expand Down Expand Up @@ -29,3 +29,16 @@ test-notebook-parameters:
@echo "Running notebook parameters validation..."
pytest tests/test_notebook_parameters.py -v
@echo "Notebook parameters test passed :)"

test-notebook-execution:
@echo "Running notebook execution tests..."
pytest tests/test_notebook_execution.py -v
@echo "Notebook execution tests passed :)"

test-notebooks: format-notebooks-check test-notebook-parameters test-notebook-execution
@echo "All notebook validations completed successfully (formatting, parameters, execution) :)"

test-all:
@echo "Running all tests..."
pytest tests/ -v
@echo "All tests passed :)"
95 changes: 95 additions & 0 deletions tests/test_notebook_execution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
"""
Simple notebook execution tests for CI/CD integration.
This module provides basic notebook execution testing using papermill,
designed to be lightweight and integrate easily with GitHub Actions.
"""

import os
import tempfile
from pathlib import Path

import papermill as pm
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Verify papermill dependency is declared in requirements.

Ensure that papermill and its required dependencies (ipykernel) are declared in the project's requirements files.


🏁 Script executed:

#!/bin/bash
# Check if papermill is declared in requirements files

echo "=== Checking for papermill in requirements files ==="
fd -t f "requirements.*\.txt$" --exec grep -H "papermill" {} \;

echo ""
echo "=== Checking for ipykernel in requirements files ==="
fd -t f "requirements.*\.txt$" --exec grep -H "ipykernel" {} \;

echo ""
echo "=== Listing all requirements files ==="
fd -t f "requirements.*\.txt$"

Length of output: 343


I need to verify whether ipykernel is actually a required dependency and check for other dependency declaration methods.

Now let me check if there are other dependency declaration files in the repo:

Let me check the project's own dependency declarations:


🏁 Script executed:

#!/bin/bash
# Check for setup.py and pyproject.toml in the project root

echo "=== Checking for setup.py ==="
fd -t f "setup.py" --max-depth 1

echo ""
echo "=== Checking for pyproject.toml ==="
fd -t f "pyproject.toml" --max-depth 1

echo ""
echo "=== Checking contents of requirements-dev.txt ==="
cat requirements-dev.txt

Length of output: 170


Let me re-run the script with proper repository access:


🏁 Script executed:

#!/bin/bash
# Check for setup.py and pyproject.toml in the project root

echo "=== Checking for setup.py ==="
fd -t f "setup.py" --max-depth 1

echo ""
echo "=== Checking for pyproject.toml ==="
fd -t f "pyproject.toml" --max-depth 1

echo ""
echo "=== Checking contents of requirements-dev.txt ==="
cat requirements-dev.txt

echo ""
echo "=== Checking if ipykernel is a transitive dependency of papermill ==="
grep -E "^(jupyter|nbclient|ipykernel)" requirements-dev.txt || echo "Not explicitly listed in requirements-dev.txt"

Length of output: 290


Let me check the pyproject.toml and verify papermill's dependencies:


🏁 Script executed:

#!/bin/bash
# Check pyproject.toml contents and look for dependency sections

echo "=== Contents of pyproject.toml ==="
cat pyproject.toml

echo ""
echo "=== Checking test dependencies in pyproject.toml ==="
grep -A 20 "\[project.optional-dependencies\]" pyproject.toml || echo "No optional dependencies found"

Length of output: 1013



🌐 Web query:

papermill package dependencies ipykernel

💡 Result:

Short answer

  • papermill itself (the core package) does not strictly declare ipykernel as a direct dependency on PyPI, but it requires a Jupyter kernel to execute notebooks — so you must have an appropriate kernel (usually provided by ipykernel) available in the environment that will run the notebook. [1][2]

Details / recommended installs

  • If you run notebooks with papermill, install ipykernel in the execution environment (or ensure the notebook's kernelspec points to an environment that exposes a kernel). Example: pip/conda install ipykernel and register the kernel. [2][6]
  • If you use the Apache Airflow papermill provider, that provider does declare ipykernel as a dependency (ipykernel >= 6.29.4). So installing apache-airflow-providers-papermill will require ipykernel >= 6.29.4. [3][4]

Useful references

  • papermill on PyPI (install info): papermill package. [1]
  • Papermill troubleshooting (NoSuchKernel / advice to install ipykernel or register kernels): Papermill docs. [2]
  • Apache Airflow provider requirements (lists ipykernel>=6.29.4): apache-airflow-providers-papermill docs/PyPI. [3][4]

Now let me check what the test file actually does to determine if ipykernel is needed:


🏁 Script executed:

#!/bin/bash
# Check what the test file actually does

echo "=== Contents of tests/test_notebook_execution.py ==="
cat tests/test_notebook_execution.py

Length of output: 3008


Add ipykernel to requirements-dev.txt

The test file executes notebooks using pm.execute_notebook() (line 48-54), which requires a Jupyter kernel to be available at runtime. While papermill itself doesn't declare ipykernel as a strict dependency, ipykernel is necessary for notebook execution in this test environment. Add ipykernel to requirements-dev.txt alongside papermill to ensure tests can execute successfully.

🤖 Prompt for AI Agents
In tests/test_notebook_execution.py around line 12, the test uses papermill to
execute notebooks but the development requirements are missing ipykernel; update
requirements-dev.txt to include ipykernel (e.g., add a new line with "ipykernel"
next to "papermill") so the Jupyter kernel is available at test runtime, then
re-run tests to confirm notebooks execute successfully.

import pytest

from conftest import get_notebook_files


def get_test_parameters():
"""Get default test parameters for notebook execution."""
return {
"files": [], # Empty files list for basic testing
"test_mode": True,
"quick_run": True,
}


def execute_single_notebook(notebook_path: Path, timeout: int = 300) -> bool:
"""
Execute a single notebook with papermill.
Args:
notebook_path: Path to notebook to execute
timeout: Execution timeout in seconds
Returns:
True if successful, raises exception if failed
"""
with tempfile.NamedTemporaryFile(suffix='.ipynb', delete=False) as tmp_file:
output_path = Path(tmp_file.name)

try:
pm.execute_notebook(
input_path=str(notebook_path),
output_path=str(output_path),
parameters=get_test_parameters(),
timeout=timeout,
kernel_name="python3"
)
return True
except Exception as e:
raise Exception(f"Notebook execution failed: {str(e)}") from e
finally:
if output_path.exists():
output_path.unlink()


@pytest.mark.parametrize("notebook_path", get_notebook_files())
def test_notebook_executes_without_error(notebook_path):
"""Test that each notebook executes without errors."""

# Skip backup/duplicate files
if any(skip in str(notebook_path) for skip in ["copy.ipynb", ".checkpoint"]):
pytest.skip(f"Skipping backup/checkpoint file: {notebook_path}")

# Execute the notebook
success = execute_single_notebook(notebook_path, timeout=300)
assert success, f"Failed to execute notebook: {notebook_path}"


def test_notebooks_directory_exists():
"""Verify the notebooks directory exists and contains files."""
notebooks = get_notebook_files()
assert len(notebooks) > 0, "No notebook files found in notebooks directory"

for notebook in notebooks:
assert notebook.exists(), f"Notebook file does not exist: {notebook}"
assert notebook.suffix == '.ipynb', f"File is not a notebook: {notebook}"


if __name__ == "__main__":
# Allow running directly for testing
import sys

if len(sys.argv) > 1:
notebook_file = Path(sys.argv[1])
if notebook_file.exists():
print(f"Testing {notebook_file}...")
try:
execute_single_notebook(notebook_file)
print("✅ Success!")
except Exception as e:
print(f"❌ Failed: {e}")
sys.exit(1)
else:
print("Usage: python test_notebook_execution.py <notebook_path>")
Loading