Skip to content

Add Flex GEX support, release tooling, and docs updates#190

Merged
rob-p merged 21 commits intomainfrom
dev
Mar 19, 2026
Merged

Add Flex GEX support, release tooling, and docs updates#190
rob-p merged 21 commits intomainfrom
dev

Conversation

@rob-p
Copy link
Copy Markdown
Contributor

@rob-p rob-p commented Mar 19, 2026

Summary

This branch adds end-to-end support for 10x Flex Gene Expression quantification, extends the chemistry registry/schema to describe Flex-specific resources, adds a release helper script for crate version bumps and publishing, and updates the documentation/tooling to cover the new command and build cleanly with a current Sphinx toolchain.

What changed

1. New flex-quant command for 10x Flex GEX

  • Added a new top-level CLI subcommand, simpleaf flex-quant, and wired it into command dispatch.
  • Implemented the Flex pipeline in src/simpleaf_commands/flex_quant.rs.
  • The command now handles the full Flex workflow:
    • chemistry lookup from the registry
    • automatic probe-set selection by organism
    • probe CSV to FASTA conversion
    • automatic probe_t2g.tsv generation
    • probe index construction with piscem build
    • cached resource reuse when probe indices or downloaded assets already exist
    • cell barcode whitelist resolution
    • sample barcode list resolution
    • mapping with piscem map-scrna
    • multi-barcode permit-list generation with alevin-fry generate-permit-list
    • alevin-fry collate
    • alevin-fry quant
    • writing run metadata to simpleaf_flex_quant_info.json

2. Chemistry schema and registry support for Flex protocols

  • Extended src/utils/chem_utils.rs with new schema types used by Flex chemistries:
    • Organism
    • ProtocolType
    • SampleBcListInfo
    • ProbeSetInfo
  • Extended CustomChemistry with new optional Flex-related fields:
    • sample_bc_list
    • probe_sets
  • Added helpers to interpret chemistry metadata:
    • protocol_type()
    • is_flex_gex()
  • Updated serde handling so the registry’s remote_url field maps correctly to the chemistry permit-list URL field.
  • Updated chemistry creation code so newly-added custom chemistries initialize the new optional Flex fields to None.
  • Registered two Flex chemistries in resources/chemistries.json:
    • 10x-flexv1-gex-3p
    • 10x-flexv2-gex-3p
  • Those registry entries include:
    • geometry strings
    • expected orientation
    • permit-list metadata
    • sample barcode list metadata
    • organism-specific probe set metadata for human and mouse
    • protocol metadata identifying the chemistry as flex_gex

3. Flex- and geometry-related behavior fixes

  • Updated geometry validation in src/utils/af_utils.rs to skip seq_geom_parser validation for piscem-only extensions such as s[N] and numbered barcode tags like b0[N], deferring validation to piscem where appropriate.
  • Corrected the Flex mapping path to use piscem map-scrna.
  • Ensured probe CSV conversion keeps all probes, including those marked excluded, in the generated FASTA and t2g mapping so quantification has a complete reference-to-gene mapping.
  • Kept structural constraints opt-in rather than forcing them on.

4. Release tooling

  • Added bump_and_publish.sh.
  • The script now supports a safer release workflow that:
    • validates the requested version as SemVer
    • requires the new version to be greater than the current crate version
    • updates both Cargo.toml and the simpleaf package entry in Cargo.lock
    • performs preflight and post-bump cargo check / cargo package
    • creates the release commit and tag
    • pushes the branch and tag
    • separates dry-run behavior from actual crates.io publishing with --dry-run and --publish

5. Clippy / compile cleanup

  • Cleaned up lints introduced by the new Flex work so the branch builds cleanly and passes strict clippy.
  • Changes include:
    • boxing the large Chemistry::Custom enum variant
    • deriving Default for ProtocolType
    • replacing manual prefix stripping with strip_prefix

6. Documentation updates

  • Added a dedicated documentation page for the new flex-quant command.
  • Linked the new page from the main docs index.
  • Added a pointer in the standard quant docs directing Flex users to flex-quant.
  • Updated workflow and chemistry docs formatting so the docs build cleanly under modern Sphinx.
  • Updated docs/requirements.txt to install sphinx explicitly instead of relying on whatever global version happens to be present.

7. Documentation toolchain modernization

  • Verified docs builds against a current Sphinx/Furo environment.
  • The docs now build cleanly with:
    • sphinx 9.1.0
    • furo 2025.12.19

8. Other repository updates

  • Added local generated/test data paths and a helper script path to .gitignore.
  • Updated the top-level CLI help snapshot to include flex-quant.
  • The committed branch state includes the crate version reflected in Cargo.lock as 0.20.0.

Verification

  • cargo build
  • cargo clippy --all-targets --all-features -- -D warnings
  • /tmp/simpleaf-docs-venv/bin/python -m sphinx -b html docs/source /tmp/simpleaf-docs-html

Notes

  • The local uncommitted edits currently present in Cargo.toml and Cargo.lock were not included in this PR; this PR describes the committed main..dev diff only.

rob-p and others added 21 commits March 17, 2026 09:57
Foundation for 10x Flex GEX support in simpleaf:

New types in chem_utils.rs:
- Organism enum: Human, Mouse, Other(String) — with FromStr,
  Display, clap::ValueEnum, serde support
- ProtocolType enum: StandardRna, FlexGex, Atac — parsed from
  meta.protocol_type in chemistry JSON
- SampleBcListInfo: plist_name + remote_url for probe barcode files
- ProbeSetInfo: name + plist_name + remote_url for organism-specific
  probe sets

Extended CustomChemistry struct:
- sample_bc_list: Option<SampleBcListInfo> — for Flex probe barcode
  rotation files
- probe_sets: Option<HashMap<String, ProbeSetInfo>> — keyed by
  organism name ("human", "mouse")
- protocol_type() helper method — reads from meta.protocol_type
- is_flex_gex() convenience method

All 49 existing tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New `simpleaf flex-quant` command that orchestrates the complete Flex
GEX pipeline: probe index building, mapping, multi-barcode permit list
generation, hierarchical collation, and quantification.

FlexQuantOpts CLI struct with:
  --chemistry: registered Flex chemistry name
  --organism: strongly-typed Organism enum (human/mouse)
  --probe-set: optional explicit probe CSV or FASTA
  --sample-bc-list: optional explicit probe barcode file
  --index: optional pre-built probe index
  --reads1/--reads2: FASTQ files
  --resolution, --threads, --kmer-length, etc.

Pipeline orchestration (flex_quant.rs):
- Resource resolution: auto-fetch cell BC whitelist, probe barcode file,
  and probe set from chemistry registry (with content-hash caching)
- Probe CSV → FASTA conversion with t2g map and metadata extraction
- Probe index building with piscem
- Mapping with chemistry geometry (includes s[N] tag)
- generate-permit-list with --sample-bc-list --unfiltered-pl
- Collate and quant (auto-detect multi-barcode from RAD)
- Full pipeline metadata output (simpleaf_flex_quant_info.json)

Example usage:
  simpleaf flex-quant \
    --chemistry 10x-flexv1-gex-3p \
    --organism human \
    -1 R1.fq.gz -2 R2.fq.gz \
    -o output -t 8

All 49 tests pass (updated CLI snapshot for new subcommand).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 10x-flexv1-gex-3p and 10x-flexv2-gex-3p entries with all resource
URLs and blake3 content hashes.

10x-flexv1-gex-3p:
  - geometry: 1{b[16]u[12]x:}2{r[50]x[18]s[8]x:}
  - cell BC: 737K-fixed-rna-profiling.txt (hash: 9fe0cb...)
  - probe BC: 128 entries, 16 samples x 8 rotations, 8bp (hash: 5dc9d1...)
  - probe sets: human v1.1.0 (hash: 9ccbd0...) + mouse v1.1.1 (hash: b4b811...)

10x-flexv2-gex-3p:
  - geometry: 1{b[16]u[12]x[10]s[10]}2{r:}
  - cell BC: 737K-flex-v2.txt (hash: dcb018...)
  - probe BC: 384 entries, 10bp barcodes (hash: 5e7ef9...)
  - probe sets: human v2.0.0 (hash: bfa53e...) + mouse v2.0.0 (hash: 8e0ea4...)

All resources hosted on UMD Box with stable URLs.
All 49 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes discovered during end-to-end testing:

1. Use 'map-sc' not 'map-scrna' as the piscem subcommand name
2. Add #[serde(rename = "remote_url")] to CustomChemistry.remote_pl_url
   so the JSON key 'remote_url' deserializes correctly (was always None)
3. Remove the manual empty unmapped_bc_count_collated.bin workaround
   since alevin-fry collate now produces it properly

Successfully tested on 4-plex human colorectal/kidney Flex v1 dataset:
- 5.6M cells across 16 sample channels (4 real + 12 noise)
- ~97.7% mapping rate
- Full auto-fetch pipeline: probe CSV download, FASTA conversion,
  index build, mapping, GPL, collate, quant

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Geometries containing piscem-only tags (s[N] for sample barcodes,
b<N>[L] for numbered barcodes) cannot be parsed by seq_geom_parser,
which only understands the standard b/u/r/x/f tags. These extended
geometries are validated by piscem at mapping time instead.

This fixes simpleaf inspect failing with a parse error on Flex
chemistry entries like 1{b[16]u[12]x:}2{r[50]x[18]s[8]x:}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes for the end-to-end flex-quant pipeline:

1. Probe CSV conversion now includes ALL probes (included + excluded)
   in both the FASTA and t2g map. The index contains all probes, so
   quant needs a t2g entry for every reference. Previously only
   "included" probes were emitted, causing a mismatch (53,459 t2g
   entries vs 54,580 index references).

2. Use piscem-rs "map-scrna" subcommand (not "map-sc" which is the
   C++ piscem name).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Structural constraints are not appropriate for probe-based Flex mapping
where references are short (~50bp) probe sequences. The flag has been
removed from the flex-quant CLI entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Flex GEX pipeline support and release helper
@rob-p rob-p merged commit da0796e into main Mar 19, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant