Skip to content

Commit afe3343

Browse files
authored
Merge pull request #7 from Cellular-Semantics/validation_refactoring
added a validation_reporting directory for AMICA
2 parents e579fb6 + 45d7fe9 commit afe3343

98 files changed

Lines changed: 4196 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/validation_reporting.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Validation & Reporting
2+
3+
This add-on summarizes how the AMICA agent performed after a CXG run completes. It
4+
inspects the grounded outputs written under `output/raw_output/**/groundings.tsv`
5+
and the curated match-type tables from `output/pandasaurus_cxg_outputs_30/*.tsv`.
6+
7+
## Usage
8+
9+
```bash
10+
uv run scripts/generate_validation_reports.py \
11+
--output-root ./output \
12+
--log-level INFO
13+
```
14+
15+
By default the script writes three Markdown summaries into `output/reports/`:
16+
17+
| File | Description |
18+
|------|-------------|
19+
| `filtered_granularity_report.md` | Aggregated stats that ignore `Broad term` / `Overlaps` rows.|
20+
| `raw_stats_report.md` | Unfiltered counts (improved / identical / regressions / no-match) for every grounding row. |
21+
| `granularity_report.md` | Narrative examples showcasing rows where the agent improved on the author label. |
22+
23+
24+
## CLI options
25+
26+
```
27+
usage: scripts/generate_validation_reports.py [-h] [--output-root OUTPUT_ROOT]
28+
[--raw-output-dir RAW_OUTPUT_DIR]
29+
[--match-type-dir MATCH_TYPE_DIR]
30+
[--reports-dir REPORTS_DIR]
31+
[--skip-filtered]
32+
[--skip-examples]
33+
[--skip-raw-stats]
34+
[--skip-ontology]
35+
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
36+
```
37+
38+
Key flags:
39+
40+
- `--output-root`: base directory that contains `raw_output/`, `pandasaurus_cxg_outputs_30/`, and `reports/`. Defaults to `resources/cxg/output`. Set this when your outputs sit under a different base (e.g., top-level ./output)
41+
- `--raw-output-dir`: direct path to per-dataset `groundings.tsv` (AMICA raw outputs). Overrides `--output-root` for groundings. Use when groundings live outside the standard layout. **If missing:** script exits with `FileNotFoundError`.
42+
- `--match-type-dir`: direct path to the Pandasaurus match-type TSV/CSV files. Use when those files live elsewhere; overrides `--output-root` for match types only. **If missing:** script exits with `FileNotFoundError`; missing individual dataset files are tolerated (that dataset runs without Broad/Overlap filtering).
43+
- `--reports-dir`: output directory for markdown reports (auto-created if needed). Defaults to `<output-root>/reports`.
44+
45+
## Skip toggles
46+
47+
- `--skip-filtered`, `--skip-raw-stats`, `--skip-examples`: toggle which reports are generated.
48+
- `--skip-filtered`: do not generate the filtered granularity report (excludes "Broad term"/"Overlaps"). Use when only raw stats are needed.
49+
- `--skip-raw-stats`: do not generate the unfiltered aggregate stats report. Use when only filtered or example reports are needed.
50+
- `--skip-examples`: do not generate the “improved examples” report. Use when only aggregate statistics are required.
51+
- `--skip-ontology`: runs without Cell Ontology lookups. This prevents network calls and allows offline runs, but any metric that depends on the hierarchy (improved vs regression, example extraction) will be degraded: rows that should be "improved" or "regression" are instead categorized as "other", and no examples are extracted. Reports include a warning banner. For accurate stats, leave ontology enabled or point to a local ontology via `--ontology-adapter` (e.g., `pronto:/path/to/cl.owl`).
52+
- `--ontology-adapter`: use a custom oaklib adapter string (e.g., `pronto:/path/to/cl.owl`) for offline runs. Defaults to `ols:cl`.
53+
54+
## Dependencies
55+
56+
The script reuses AMICA's `oaklib` dependency to talk to the Cell Ontology
57+
through OLS. Make sure the machine has outbound network access when you want
58+
hierarchy-aware stats; otherwise pass `--skip-ontology` to generate degraded
59+
reports without ontology calls.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
author_cell_type CL_label CL_ID match_type reference dataset_version
2+
SI_earlyAE enterocyte of epithelium of small intestine CL:1000334 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
3+
SI_AE2 enterocyte of epithelium of small intestine CL:1000334 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
4+
SI_tuft intestinal tuft cell CL:0019032 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
5+
C_earlyACC enterocyte of epithelium of large intestine CL:0002071 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
6+
C_goblet colon goblet cell CL:0009039 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
7+
SI_matureAE enterocyte of epithelium of small intestine CL:1000334 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
8+
C_BEST4 BEST4+ enterocyte CL:4030026 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
9+
SI_intermAE enterocyte of epithelium of small intestine CL:1000334 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
10+
C_lateACC enterocyte of epithelium of large intestine CL:0002071 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
11+
SI_goblet small intestine goblet cell CL:1000495 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
12+
C_EEC enteroendocrine cell of colon CL:0009042 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
13+
C_tuft tuft cell of colon CL:0009041 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
14+
C_ISC intestinal crypt stem cell of colon CL:0009043 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
15+
SI_ISC intestinal crypt stem cell of small intestine CL:0009017 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
16+
SI_BEST4 epithelial cell of small intestine CL:0002254 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
17+
SI_TA transit amplifying cell of small intestine CL:0009012 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
18+
C_TA transit amplifying cell of colon CL:0009011 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
19+
SI_secretory_prog progenitor cell CL:0011026 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
20+
C_secretory_prog progenitor cell CL:0011026 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
21+
SI_EEC enteroendocrine cell of small intestine CL:0009006 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
22+
SI_TA2 transit amplifying cell of small intestine CL:0009012 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
23+
SI_paneth paneth cell of epithelium of small intestine CL:1000343 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
24+
SI_FAE microfold cell of epithelium of small intestine CL:1000353 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
25+
absorptive enterocyte of epithelium of small intestine CL:1000334 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
26+
tuft intestinal tuft cell CL:0019032 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
27+
absorptive enterocyte of epithelium of large intestine CL:0002071 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
28+
goblet colon goblet cell CL:0009039 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
29+
BEST4+ BEST4+ enterocyte CL:4030026 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
30+
goblet small intestine goblet cell CL:1000495 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
31+
EEC enteroendocrine cell of colon CL:0009042 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
32+
tuft tuft cell of colon CL:0009041 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
33+
ISC intestinal crypt stem cell of colon CL:0009043 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
34+
ISC intestinal crypt stem cell of small intestine CL:0009017 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
35+
BEST4+ epithelial cell of small intestine CL:0002254 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
36+
TA transit amplifying cell of small intestine CL:0009012 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
37+
TA transit amplifying cell of colon CL:0009011 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
38+
secretory_prog progenitor cell CL:0011026 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
39+
EEC enteroendocrine cell of small intestine CL:0009006 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
40+
paneth paneth cell of epithelium of small intestine CL:1000343 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
41+
FAE microfold cell of epithelium of small intestine CL:1000353 exact https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
42+
SI_6-? enterocyte of epithelium of small intestine CL:1000334 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
43+
C_earlyCC enterocyte of epithelium of large intestine CL:0002071 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
44+
C_lateCC enterocyte of epithelium of large intestine CL:0002071 more specific term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
45+
SI_secretory small intestine goblet cell CL:1000495 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad
46+
SI_secretory paneth cell of epithelium of small intestine CL:1000343 broad term https://doi.org/10.1016/j.jcmgh.2022.02.007 https://datasets.cellxgene.cziscience.com/45a7d3bd-dc1a-4565-8881-25f8975247a6.h5ad

0 commit comments

Comments
 (0)