Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions docs/validation_reporting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Validation & Reporting

This add-on summarizes how the AMICA agent performed after a CXG run completes. It
inspects the grounded outputs written under `output/raw_output/**/groundings.tsv`
and the curated match-type tables from `output/pandasaurus_cxg_outputs_30/*.tsv`.

## Usage

```bash
uv run scripts/generate_validation_reports.py \
--output-root ./output \
--log-level INFO
```

By default the script writes three Markdown summaries into `output/reports/`:

| File | Description |
|------|-------------|
| `filtered_granularity_report.md` | Aggregated stats that ignore `Broad term` / `Overlaps` rows.|
| `raw_stats_report.md` | Unfiltered counts (improved / identical / regressions / no-match) for every grounding row. |
| `granularity_report.md` | Narrative examples showcasing rows where the agent improved on the author label. |


## CLI options

```
usage: scripts/generate_validation_reports.py [-h] [--output-root OUTPUT_ROOT]
[--raw-output-dir RAW_OUTPUT_DIR]
[--match-type-dir MATCH_TYPE_DIR]
[--reports-dir REPORTS_DIR]
[--skip-filtered]
[--skip-examples]
[--skip-raw-stats]
[--skip-ontology]
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
```

Key flags:

- `--output-root`: base directory that contains `raw_output/`, `pandasaurus_cxg_outputs_30/`, and `reports/`. Defaults to `resources/cxg/output`. Set this when your outputs sit under a different base (e.g., top-level ./output)
- `--raw-output-dir`: direct path to per-dataset `groundings.tsv` (AMICA raw outputs). Overrides `--output-root` for groundings. Use when groundings live outside the standard layout. **If missing:** script exits with `FileNotFoundError`.
- `--match-type-dir`: direct path to the Pandasaurus match-type TSV/CSV files. Use when those files live elsewhere; overrides `--output-root` for match types only. **If missing:** script exits with `FileNotFoundError`; missing individual dataset files are tolerated (that dataset runs without Broad/Overlap filtering).
- `--reports-dir`: output directory for markdown reports (auto-created if needed). Defaults to `<output-root>/reports`.

## Skip toggles

- `--skip-filtered`, `--skip-raw-stats`, `--skip-examples`: toggle which reports are generated.
- `--skip-filtered`: do not generate the filtered granularity report (excludes "Broad term"/"Overlaps"). Use when only raw stats are needed.
- `--skip-raw-stats`: do not generate the unfiltered aggregate stats report. Use when only filtered or example reports are needed.
- `--skip-examples`: do not generate the “improved examples” report. Use when only aggregate statistics are required.
- `--skip-ontology`: runs without Cell Ontology lookups. This prevents network calls and allows offline runs, but any metric that depends on the hierarchy (improved vs regression, example extraction) will be degraded: rows that should be "improved" or "regression" are instead categorized as "other", and no examples are extracted. Reports include a warning banner. For accurate stats, leave ontology enabled or point to a local ontology via `--ontology-adapter` (e.g., `pronto:/path/to/cl.owl`).
- `--ontology-adapter`: use a custom oaklib adapter string (e.g., `pronto:/path/to/cl.owl`) for offline runs. Defaults to `ols:cl`.

## Dependencies

The script reuses AMICA's `oaklib` dependency to talk to the Cell Ontology
through OLS. Make sure the machine has outbound network access when you want
hierarchy-aware stats; otherwise pass `--skip-ontology` to generate degraded
reports without ontology calls.
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
author_cell_type CL_label CL_ID match_type reference dataset_version
SI_earlyAE enterocyte of epithelium of small intestine CL:1000334 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_AE2 enterocyte of epithelium of small intestine CL:1000334 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_tuft intestinal tuft cell CL:0019032 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
C_earlyACC enterocyte of epithelium of large intestine CL:0002071 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
C_goblet colon goblet cell CL:0009039 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_matureAE enterocyte of epithelium of small intestine CL:1000334 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
C_BEST4 BEST4+ enterocyte CL:4030026 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_intermAE enterocyte of epithelium of small intestine CL:1000334 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
C_lateACC enterocyte of epithelium of large intestine CL:0002071 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_goblet small intestine goblet cell CL:1000495 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
C_EEC enteroendocrine cell of colon CL:0009042 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
C_tuft tuft cell of colon CL:0009041 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
C_ISC intestinal crypt stem cell of colon CL:0009043 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_ISC intestinal crypt stem cell of small intestine CL:0009017 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_BEST4 epithelial cell of small intestine CL:0002254 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_TA transit amplifying cell of small intestine CL:0009012 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
C_TA transit amplifying cell of colon CL:0009011 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_secretory_prog progenitor cell CL:0011026 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
C_secretory_prog progenitor cell CL:0011026 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_EEC enteroendocrine cell of small intestine CL:0009006 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_TA2 transit amplifying cell of small intestine CL:0009012 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_paneth paneth cell of epithelium of small intestine CL:1000343 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_FAE microfold cell of epithelium of small intestine CL:1000353 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
absorptive enterocyte of epithelium of small intestine CL:1000334 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
tuft intestinal tuft cell CL:0019032 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
absorptive enterocyte of epithelium of large intestine CL:0002071 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
goblet colon goblet cell CL:0009039 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
BEST4+ BEST4+ enterocyte CL:4030026 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
goblet small intestine goblet cell CL:1000495 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
EEC enteroendocrine cell of colon CL:0009042 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
tuft tuft cell of colon CL:0009041 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
ISC intestinal crypt stem cell of colon CL:0009043 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
ISC intestinal crypt stem cell of small intestine CL:0009017 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
BEST4+ epithelial cell of small intestine CL:0002254 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
TA transit amplifying cell of small intestine CL:0009012 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
TA transit amplifying cell of colon CL:0009011 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
secretory_prog progenitor cell CL:0011026 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
EEC enteroendocrine cell of small intestine CL:0009006 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
paneth paneth cell of epithelium of small intestine CL:1000343 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
FAE microfold cell of epithelium of small intestine CL:1000353 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_6-? enterocyte of epithelium of small intestine CL:1000334 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
C_earlyCC enterocyte of epithelium of large intestine CL:0002071 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
C_lateCC enterocyte of epithelium of large intestine CL:0002071 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_secretory small intestine goblet cell CL:1000495 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
SI_secretory paneth cell of epithelium of small intestine CL:1000343 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
Loading