Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .claude/agents/replication-coder.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ The pipeline must run end-to-end via `snakemake --cores 1` on a fresh checkout.
3. **Port the analysis** notebook by notebook, in order (01 → 02 → 03 → 04). For each:
- Read existing notebook content (some may be partial scaffolds).
- Replace placeholders with real code.
- Update `environment.yml` with every new import — see `docs/cicd-conventions.md` § environment.yml is the single source of truth.
- Update `pixi.toml` with every new import (then `pixi install` to refresh `pixi.lock`; commit both) — see `docs/cicd-conventions.md` § pixi.toml is the single source of truth.
- Update the Snakefile rules with the actual file paths.
4. **Test before claiming done.** See `docs/verify-before-drafting.md` § Test before claiming ready.
- Run `snakemake --cores 1 -n` (dry run) to verify the DAG.
- Run the notebook(s) end-to-end via `mamba run -n <env> jupytext --to notebook --execute notebooks/0X_….py`.
- Run `pixi run snakemake --cores 1 -n` (dry run) to verify the DAG.
- Run the notebook(s) end-to-end via `pixi run jupytext --to notebook --execute notebooks/0X_….py`.
- If you can't run something, mark it explicitly as untested in the conversation.

## Anti-patterns
Expand All @@ -44,4 +44,4 @@ The pipeline must run end-to-end via `snakemake --cores 1` on a fresh checkout.

## Output

Updated notebook files in `notebooks/`, `environment.yml`, `Snakefile`. Tell the user what runs, what's untested, what you deviated from in the original paper, and what additional credentials / data DOIs they need to set up.
Updated notebook files in `notebooks/`, `pixi.toml` (+ regenerated `pixi.lock`), `Snakefile`. Tell the user what runs, what's untested, what you deviated from in the original paper, and what additional credentials / data DOIs they need to set up.
1 change: 1 addition & 0 deletions .claude/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@
"Bash(python *)",
"Bash(python3 *)",
"Bash(pip *)",
"Bash(pixi *)",
"Bash(mamba *)",
"Bash(micromamba *)",
"Bash(conda *)",
Expand Down
4 changes: 2 additions & 2 deletions .claude/skills/import-from-nanopub/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ What this does, in two layers:

- Scans each Outcome / Research Software nanopub for a `hasOutcomeRepository` URI (which may be either a GitHub URL or a Zenodo DOI — both are handled; Zenodo DOIs are resolved to GitHub URLs via Zenodo's `related_identifiers` API).
- **`git clone`s each sibling repository** into `--siblings-dir` (default `../`, matching the convention of keeping related replication repos as filesystem siblings).
- Copies a curated set of starter files from the first cloned sibling into `_template_from_prior/` (`--staging-dir`): `environment.yml`, `Snakefile`, `notebooks/01_data_download.py`, `notebooks/02_data_clean.py`, `Dockerfile`. Each file gets a provenance header.
- Copies a curated set of starter files from the first cloned sibling into `_template_from_prior/` (`--staging-dir`): `pixi.toml`, `pixi.lock`, `Snakefile`, `notebooks/01_data_download.py`, `notebooks/02_data_clean.py`, `Dockerfile`. Each file gets a provenance header.
- Writes `nanopubs/imported/SETUP_INHERITED.md` documenting which sibling URLs were resolved, where they were cloned, which files were staged, and what to do with them.

To disable the infrastructure layer (claim-only import), pass `--no-inherit`. To skip cloning but still attempt inheritance from already-present sibling clones, pass `--no-clone-siblings`.
Expand Down Expand Up @@ -191,7 +191,7 @@ After `CHAIN_SUMMARY.md` is written, tell the user:

If `SETUP_INHERITED.md` reports that files were copied to `_template_from_prior/`, also tell the user:

> *"The infrastructure-layer inheritance has staged `<N>` starter files at `_template_from_prior/`, copied from the canonical sibling chain. These include `environment.yml`, `Snakefile`, and `notebooks/01_data_download.py` — read the SETUP_INHERITED.md table for the full list and provenance. **Review each staged file, merge with your own at the corresponding path, then delete `_template_from_prior/`.** This staging directory is a one-shot reference area, NOT durable repo state. Do not commit it."*
> *"The infrastructure-layer inheritance has staged `<N>` starter files at `_template_from_prior/`, copied from the canonical sibling chain. These include `pixi.toml`, `pixi.lock`, `Snakefile`, and `notebooks/01_data_download.py` — read the SETUP_INHERITED.md table for the full list and provenance. **Review each staged file, merge with your own at the corresponding path, then delete `_template_from_prior/`.** This staging directory is a one-shot reference area, NOT durable repo state. Do not commit it."*
>
> *If you opted out of cloning (`--no-clone-siblings`) or no `hasOutcomeRepository` URIs were found in the imported nanopubs, the staging area is empty and only the resolved URLs appear in `SETUP_INHERITED.md` for your reference.*

Expand Down
4 changes: 2 additions & 2 deletions .claude/skills/init-template/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,9 +138,9 @@ python3 scripts/import-nanopub-chain.py "<PRIOR_CHAIN_URI>"
This single command does both layers in one pass:

- **Claim layer**: SPARQL-walks the citation graph from the entry URI, fetches every reachable nanopub, caches each TriG to `nanopubs/imported/trig/<RA-id>.trig`, writes `nanopubs/imported/constellation.json` + `cited_papers.txt`.
- **Infrastructure layer**: resolves each Outcome / Research Software nanopub's `hasOutcomeRepository` URI (GitHub URLs handled directly, Zenodo DOIs resolved via the Zenodo REST API's `related_identifiers`), `git clone`s each sibling repo into `../`, copies starter files (`environment.yml`, `Snakefile`, `notebooks/01_data_download.py`, `02_data_clean.py`, `Dockerfile`) from the canonical sibling into `_template_from_prior/` with provenance headers, and writes `nanopubs/imported/SETUP_INHERITED.md`.
- **Infrastructure layer**: resolves each Outcome / Research Software nanopub's `hasOutcomeRepository` URI (GitHub URLs handled directly, Zenodo DOIs resolved via the Zenodo REST API's `related_identifiers`), `git clone`s each sibling repo into `../`, copies starter files (`pixi.toml`, `pixi.lock`, `Snakefile`, `notebooks/01_data_download.py`, `02_data_clean.py`, `Dockerfile`) from the canonical sibling into `_template_from_prior/` with provenance headers, and writes `nanopubs/imported/SETUP_INHERITED.md`.

Network access is required. The script depends on `rdflib` (already in the conda env spec).
Network access is required. The script depends on `rdflib` (already in the pixi env spec).

### Step 9b — Generate the claim-layer summary

Expand Down
4 changes: 2 additions & 2 deletions .claude/skills/replication-study/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ For each phase, dispatch to the right specialist or guide the user manually:
|---|---|---|
| **0 Bootstrap** | Run `/init-template`. Drop paper PDF in `paper/`. | — |
| **1 Paper analysis** | Use the `paper-analyst` agent. Output: `nanopubs/drafts/00_paper_summary.md` + `nanopubs/drafts/01_quote.md` Quoted Text. | `Agent({subagent_type: "paper-analyst"})` |
| **2 Code & data port** | Use the `replication-coder` agent. Update `environment.yml`, notebooks, Snakefile. | `Agent({subagent_type: "replication-coder"})` |
| **3 Local results** | Run `snakemake --cores 1`. Compare headline number to paper. Write `nanopubs/drafts/05_outcome.md` placeholders. | manual |
| **2 Code & data port** | Use the `replication-coder` agent. Update `pixi.toml` (+ `pixi.lock`), notebooks, Snakefile. | `Agent({subagent_type: "replication-coder"})` |
| **3 Local results** | Run `pixi run snakemake --cores 1`. Compare headline number to paper. Write `nanopubs/drafts/05_outcome.md` placeholders. | manual |
| **4 Release** | Run `docs/fair4rs-checklist.md` pre-release checklist. Cut a `gh release` with a Zenodo-formatted body. | manual |
| **5 FORRT chain** | Use the `nanopub-drafter` agent for each step. User publishes each draft on platform.sciencelive4all.org and pastes the URI into `nanopubs/PUBLISHED.md`. | `Agent({subagent_type: "nanopub-drafter"})` ×6 |

Expand Down
20 changes: 10 additions & 10 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,24 +19,25 @@ jobs:
- name: Skip CI if template has not been initialised or notebooks are scaffolds
id: guard
run: |
if grep -rln '{{[A-Z_]\+}}' . --include='*.md' --include='*.yml' --include='*.yaml' --include='*.json' --include='*.cff' --include='*.py' 2>/dev/null | grep -v '^./.claude/skills/init-template/' | head -1 > /dev/null; then
placeholder_files=$(grep -rln '{{[A-Z_]\+}}' . --include='*.md' --include='*.yml' --include='*.yaml' --include='*.json' --include='*.cff' --include='*.py' --include='*.toml' 2>/dev/null | grep -v 'claude/skills/init-template/' || true)
scaffold_files=$(grep -lE 'raise NotImplementedError|"<dataset-name>"|# Example skeleton — adapt' notebooks/*.py 2>/dev/null || true)
if [ -n "$placeholder_files" ]; then
echo "::notice::Template placeholders detected ({{...}} tokens). Run /init-template inside Claude Code (or substitute manually) before CI runs meaningfully. Skipping the rest of this workflow."
echo "skip=true" >> "$GITHUB_OUTPUT"
elif grep -lE 'raise NotImplementedError|"<dataset-name>"|# Example skeleton — adapt' notebooks/*.py 2>/dev/null | head -1 > /dev/null; then
elif [ -n "$scaffold_files" ]; then
echo "::notice::Notebooks are still in scaffold state — replace placeholders in notebooks/*.py with your actual replication code (Phase 2). The Snakefile rule outputs will not be produced until then. Skipping pipeline run."
echo "skip=true" >> "$GITHUB_OUTPUT"
else
echo "skip=false" >> "$GITHUB_OUTPUT"
fi

- name: Set up Micromamba
- name: Set up pixi
if: steps.guard.outputs.skip != 'true'
uses: mamba-org/setup-micromamba@v3
uses: prefix-dev/setup-pixi@v0.9.6
with:
environment-file: environment.yml
environment-name: ${{ github.event.repository.name }}
init-shell: bash
cache-environment: true
pixi-version: v0.68.1
locked: true
cache: true

# ----- credentials (uncomment + add the matching GitHub secret) -----
#
Expand Down Expand Up @@ -64,8 +65,7 @@ jobs:

- name: Run pipeline
if: steps.guard.outputs.skip != 'true'
shell: micromamba-shell {0}
run: snakemake --cores 1
run: pixi run snakemake --cores 1

- name: Upload results
if: always() && steps.guard.outputs.skip != 'true'
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ jobs:
- name: Skip if template has not been initialised
id: guard
run: |
if grep -rln '{{[A-Z_]\+}}' . --include='*.md' --include='*.yml' --include='*.yaml' --include='*.json' --include='*.cff' --include='*.py' 2>/dev/null | grep -v '^./.claude/skills/init-template/' | head -1 > /dev/null; then
placeholder_files=$(grep -rln '{{[A-Z_]\+}}' . --include='*.md' --include='*.yml' --include='*.yaml' --include='*.json' --include='*.cff' --include='*.py' --include='*.toml' 2>/dev/null | grep -v 'claude/skills/init-template/' || true)
if [ -n "$placeholder_files" ]; then
echo "::notice::Template placeholders detected ({{...}} tokens). Run /init-template before releasing. Skipping Docker build."
echo "skip=true" >> "$GITHUB_OUTPUT"
else
Expand Down
34 changes: 12 additions & 22 deletions .github/workflows/jupyter-book.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ jobs:
- name: Skip if template has not been initialised
id: guard
run: |
if grep -rln '{{[A-Z_]\+}}' . --include='*.md' --include='*.yml' --include='*.yaml' --include='*.json' --include='*.cff' --include='*.py' 2>/dev/null | grep -v '^./.claude/skills/init-template/' | head -1 > /dev/null; then
placeholder_files=$(grep -rln '{{[A-Z_]\+}}' . --include='*.md' --include='*.yml' --include='*.yaml' --include='*.json' --include='*.cff' --include='*.py' --include='*.toml' 2>/dev/null | grep -v 'claude/skills/init-template/' || true)
if [ -n "$placeholder_files" ]; then
echo "::notice::Template placeholders detected ({{...}} tokens). Run /init-template inside Claude Code (or substitute manually) before the Jupyter Book builds meaningfully. Skipping."
echo "skip=true" >> "$GITHUB_OUTPUT"
else
Expand All @@ -39,58 +40,47 @@ jobs:
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
if gh api "repos/${{ github.repository }}/pages" 2>/dev/null | head -1 > /dev/null; then
if gh api "repos/${{ github.repository }}/pages" >/dev/null 2>&1; then
echo "::notice::GitHub Pages is enabled — Jupyter Book will deploy."
echo "pages_enabled=true" >> "$GITHUB_OUTPUT"
else
echo "::notice::GitHub Pages is not yet enabled on this repo. Build will run; deploy will be skipped. Enable Pages at https://github.com/${{ github.repository }}/settings/pages (Source: GitHub Actions) to enable deploy on the next push."
echo "pages_enabled=false" >> "$GITHUB_OUTPUT"
fi

- name: Set up Micromamba
- name: Set up pixi
if: steps.guard.outputs.skip != 'true'
uses: mamba-org/setup-micromamba@v3
uses: prefix-dev/setup-pixi@v0.9.6
with:
environment-file: environment.yml
environment-name: ${{ github.event.repository.name }}
init-shell: bash
cache-environment: true
pixi-version: v0.68.1
locked: true
cache: true
environments: docs

- name: Convert .py notebooks to .ipynb
if: steps.guard.outputs.skip != 'true'
shell: micromamba-shell {0}
run: |
for nb in notebooks/*.py; do
jupytext --to notebook "$nb"
pixi run jupytext --to notebook "$nb"
done

# Glob, not a hard-coded list — new notebooks are picked up automatically.
- name: Execute notebooks
if: steps.guard.outputs.skip != 'true'
shell: micromamba-shell {0}
run: |
for nb in notebooks/*.ipynb; do
echo "::group::Executing $nb"
jupyter execute --inplace "$nb"
pixi run jupyter execute --inplace "$nb"
echo "::endgroup::"
done

- uses: actions/setup-node@v4
if: steps.guard.outputs.skip != 'true'
with:
node-version: 22.x

- name: Install MyST
if: steps.guard.outputs.skip != 'true'
run: npm install -g mystmd

- name: Build MyST site
if: steps.guard.outputs.skip != 'true'
env:
# MyST silently ignores `base_url` in myst.yml — only this env var
# works. See docs/cicd-conventions.md.
BASE_URL: /${{ github.event.repository.name }}
run: myst build --html
run: pixi run -e docs myst build --html

- name: Upload pages artifact
if: steps.guard.outputs.skip != 'true'
Expand Down
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@ __pycache__/
.venv/
.pytest_cache/

# Pixi install cache — never committed. pixi.toml + pixi.lock ARE committed
# (they're the dep manifest + per-platform lockfile); the .pixi/ directory is
# the per-machine install of the resolved env and is regenerated by `pixi install`.
.pixi/

# Snakemake run cache (DAG, locks, log files for each invocation)
.snakemake/

# Environment / OS
.DS_Store
.env
Expand Down
6 changes: 3 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Key FAIR4RS handles in this template:

- **F (Findable)** — Zenodo concept DOI minted at first release; `CITATION.cff` + `codemeta.json` populated; GitHub topics suggested in `README.md`.
- **A (Accessible)** — public GitHub repo, MIT license, GHCR image, Zenodo archive of the source tarball and (optionally) the Docker image, so the software remains accessible even if GitHub or GHCR disappears.
- **I (Interoperable)** — `environment.yml` is the single declarative source of truth; standard formats (jupytext `.py` notebooks, parquet/CSV/NetCDF data, OCI Docker image); cross-references to upstream paper DOI, dataset DOIs, and FORRT nanopub URIs use qualified relations.
- **I (Interoperable)** — `pixi.toml` + `pixi.lock` are the single declarative source of truth (the lockfile pins every package per platform); standard formats (jupytext `.py` notebooks, parquet/CSV/NetCDF data, OCI Docker image); cross-references to upstream paper DOI, dataset DOIs, and FORRT nanopub URIs use qualified relations.
- **R (Reusable)** — explicit MIT license, detailed provenance via the FORRT chain (every claim traceable to data + method + author ORCID), `CITATION.cff` lists upstream paper as a `references` entry, `codemeta.json` lists `softwareRequirements` and `referencePublication`.

The FORRT nanopublication chain itself is what makes **R1.2 (provenance)** machine-actionable: it does what `CITATION.cff` cannot, by separating *claims about the world* from *facts about the software*, with cryptographically signed nanopubs at each step.
Expand Down Expand Up @@ -89,7 +89,7 @@ Exit (either entry point): a complete `nanopubs/drafts/01_quote.md` with verifie

### Phase 2 — Code & data port

- `environment.yml` lists every dependency the notebooks import (`docs/cicd-conventions.md` § environment.yml is the single source of truth).
- `pixi.toml` lists every dependency the notebooks import, and `pixi.lock` is regenerated and committed alongside it (`docs/cicd-conventions.md` § pixi.toml is the single source of truth).
- `notebooks/01_data_download.py` fetches all input data — no manual steps.
- `notebooks/02_data_clean.py` produces a tidy intermediate dataset.
- `notebooks/03_analysis.py` reproduces the paper's headline statistic.
Expand Down Expand Up @@ -180,7 +180,7 @@ The documents under `docs/` are the load-bearing reference material; reach for t
| Choose the FORRT Claim type | `docs/claim-type-vocabulary.md` |
| Write the Quote, Study, or Outcome | `docs/verify-before-drafting.md` |
| Write a PICO or PCC question | `docs/pico-study-outcome-levels.md` |
| Touch CI / environment.yml / Dockerfile / myst.yml | `docs/cicd-conventions.md` |
| Touch CI / pixi.toml / Dockerfile / myst.yml | `docs/cicd-conventions.md` |
| Cut a release / claim a phase done | `docs/fair4rs-checklist.md` |
| Update RO-Crate metadata after adding artefacts | `docs/ro-crate.md` |
| Need to retract / supersede / batch-publish a nanopub | `docs/programmatic-nanopubs.md` |
Expand Down
2 changes: 1 addition & 1 deletion DOMAIN.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ When the user asks Claude to set up a typical analysis, the default tools to sug
| Intermediate / archival arrays | `netCDF4` (small ≤2 GB), `zarr` (larger / cloud) | **never `.npz`** — see Data formats convention below |
| HEALPix-indexed EO archival | EOPF Zarr (Earth Observation Processing Framework profile) | Standardised metadata for HEALPix dim-naming, NESTED ordering, projection. See [`EOPF-DGGS/legacy-converters`](https://github.com/EOPF-DGGS/legacy-converters) for conversion patterns. |

Pin every dependency in `environment.yml` — pangeo dev environments hide missing deps locally and CI then silently fails with empty notebook cells.
Pin every dependency in `pixi.toml` and commit the regenerated `pixi.lock` — pangeo dev environments hide missing deps locally and CI then silently fails with empty notebook cells.

## Domain conventions

Expand Down
17 changes: 9 additions & 8 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
FROM mambaorg/micromamba:1.5-jammy
FROM ghcr.io/prefix-dev/pixi:0.68.1

LABEL org.opencontainers.image.source="https://github.com/{{REPO_ORG}}/{{REPO_NAME}}"
LABEL org.opencontainers.image.description="Replication study container for {{REPO_NAME}}"
LABEL org.opencontainers.image.licenses="MIT"

COPY --chown=$MAMBA_USER:$MAMBA_USER environment.yml /tmp/environment.yml
RUN micromamba install -y -n base -f /tmp/environment.yml && \
micromamba clean --all --yes

WORKDIR /app
COPY --chown=$MAMBA_USER:$MAMBA_USER . /app

# Install the pinned environment first (separate from source copy so the lock
# layer is cached across source-only edits).
COPY pixi.toml pixi.lock /app/
RUN pixi install --locked

COPY . /app

# Mount any required credentials at runtime, e.g.:
# docker run -v ~/.cdsapirc:/home/mambauser/.cdsapirc {{REPO_NAME}}
# See data/README.md for per-dataset credential setup.

ENTRYPOINT ["/usr/local/bin/_entrypoint.sh"]
CMD ["snakemake", "--cores", "1"]
CMD ["pixi", "run", "snakemake", "--cores", "1"]
Loading
Loading