Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run changed notebooks in CI #409

Merged
merged 22 commits into from
Feb 22, 2025
Merged

Run changed notebooks in CI #409

merged 22 commits into from
Feb 22, 2025

Conversation

jhamon
Copy link
Collaborator

@jhamon jhamon commented Feb 20, 2025

Problem

We need to ensure that notebooks which are heavily featured in our docs are actually working

Solution

Implement a workflow to convert a list of specified notebooks into scripts that can be run in CI. For each notebook (which is a json document with an *.ipynb file extension):

  • Create a temporary directory to isolate from other parts of the repo. Copy the single notebook over to that tmpdir.
  • Use the nbformat packge to iterate through notebook cells
  • If a cell is showing pip install steps, include those in a run.sh bash setup script to activate and configure a venv with required dependencies
  • Collect code from other code cells into a python script called notebook.py
  • From the workflow, run the run.sh script to install needed dependencies and execute the notebook.py script
  • Execution without errors considered a success

Since some notebooks are quite slow/lengthy (for example, if computing embeddings across a large sample dataset) and there are a lot of them (over 200 notebooks in this repo), we won't run every notebook on every push because that would be slow/expensive and deter people from making small improvements. Instead, we will only run those notebooks which have changed relative to the master branch.

Some notebooks may fail until minor adjustments are made to them to account for implicit dependencies on things installed in the colab environment that are not available in our CI environment. For example, we may need to add a missing a statement like !pip install pandas instead of just using it directly.

Type of Change

  • Infrastructure change (CI configs, etc)

@jhamon jhamon changed the title Convert notebook to script Run notebooks in CI Feb 20, 2025
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jhamon jhamon marked this pull request as ready for review February 22, 2025 16:53
@jhamon jhamon changed the title Run notebooks in CI Run changed notebooks in CI Feb 22, 2025
@jhamon jhamon merged commit 5caec55 into master Feb 22, 2025
7 checks passed
@jhamon jhamon deleted the jhamon/run-nb-in-CI branch February 22, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant