Conversation
Merge main back into dev
Foundation for 10x Flex GEX support in simpleaf:
New types in chem_utils.rs:
- Organism enum: Human, Mouse, Other(String) — with FromStr,
Display, clap::ValueEnum, serde support
- ProtocolType enum: StandardRna, FlexGex, Atac — parsed from
meta.protocol_type in chemistry JSON
- SampleBcListInfo: plist_name + remote_url for probe barcode files
- ProbeSetInfo: name + plist_name + remote_url for organism-specific
probe sets
Extended CustomChemistry struct:
- sample_bc_list: Option<SampleBcListInfo> — for Flex probe barcode
rotation files
- probe_sets: Option<HashMap<String, ProbeSetInfo>> — keyed by
organism name ("human", "mouse")
- protocol_type() helper method — reads from meta.protocol_type
- is_flex_gex() convenience method
All 49 existing tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New `simpleaf flex-quant` command that orchestrates the complete Flex
GEX pipeline: probe index building, mapping, multi-barcode permit list
generation, hierarchical collation, and quantification.
FlexQuantOpts CLI struct with:
--chemistry: registered Flex chemistry name
--organism: strongly-typed Organism enum (human/mouse)
--probe-set: optional explicit probe CSV or FASTA
--sample-bc-list: optional explicit probe barcode file
--index: optional pre-built probe index
--reads1/--reads2: FASTQ files
--resolution, --threads, --kmer-length, etc.
Pipeline orchestration (flex_quant.rs):
- Resource resolution: auto-fetch cell BC whitelist, probe barcode file,
and probe set from chemistry registry (with content-hash caching)
- Probe CSV → FASTA conversion with t2g map and metadata extraction
- Probe index building with piscem
- Mapping with chemistry geometry (includes s[N] tag)
- generate-permit-list with --sample-bc-list --unfiltered-pl
- Collate and quant (auto-detect multi-barcode from RAD)
- Full pipeline metadata output (simpleaf_flex_quant_info.json)
Example usage:
simpleaf flex-quant \
--chemistry 10x-flexv1-gex-3p \
--organism human \
-1 R1.fq.gz -2 R2.fq.gz \
-o output -t 8
All 49 tests pass (updated CLI snapshot for new subcommand).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 10x-flexv1-gex-3p and 10x-flexv2-gex-3p entries with all resource
URLs and blake3 content hashes.
10x-flexv1-gex-3p:
- geometry: 1{b[16]u[12]x:}2{r[50]x[18]s[8]x:}
- cell BC: 737K-fixed-rna-profiling.txt (hash: 9fe0cb...)
- probe BC: 128 entries, 16 samples x 8 rotations, 8bp (hash: 5dc9d1...)
- probe sets: human v1.1.0 (hash: 9ccbd0...) + mouse v1.1.1 (hash: b4b811...)
10x-flexv2-gex-3p:
- geometry: 1{b[16]u[12]x[10]s[10]}2{r:}
- cell BC: 737K-flex-v2.txt (hash: dcb018...)
- probe BC: 384 entries, 10bp barcodes (hash: 5e7ef9...)
- probe sets: human v2.0.0 (hash: bfa53e...) + mouse v2.0.0 (hash: 8e0ea4...)
All resources hosted on UMD Box with stable URLs.
All 49 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes discovered during end-to-end testing: 1. Use 'map-sc' not 'map-scrna' as the piscem subcommand name 2. Add #[serde(rename = "remote_url")] to CustomChemistry.remote_pl_url so the JSON key 'remote_url' deserializes correctly (was always None) 3. Remove the manual empty unmapped_bc_count_collated.bin workaround since alevin-fry collate now produces it properly Successfully tested on 4-plex human colorectal/kidney Flex v1 dataset: - 5.6M cells across 16 sample channels (4 real + 12 noise) - ~97.7% mapping rate - Full auto-fetch pipeline: probe CSV download, FASTA conversion, index build, mapping, GPL, collate, quant Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Geometries containing piscem-only tags (s[N] for sample barcodes,
b<N>[L] for numbered barcodes) cannot be parsed by seq_geom_parser,
which only understands the standard b/u/r/x/f tags. These extended
geometries are validated by piscem at mapping time instead.
This fixes simpleaf inspect failing with a parse error on Flex
chemistry entries like 1{b[16]u[12]x:}2{r[50]x[18]s[8]x:}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes for the end-to-end flex-quant pipeline: 1. Probe CSV conversion now includes ALL probes (included + excluded) in both the FASTA and t2g map. The index contains all probes, so quant needs a t2g entry for every reference. Previously only "included" probes were emitted, causing a mismatch (53,459 t2g entries vs 54,580 index references). 2. Use piscem-rs "map-scrna" subcommand (not "map-sc" which is the C++ piscem name). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Structural constraints are not appropriate for probe-based Flex mapping where references are short (~50bp) probe sequences. The flag has been removed from the flex-quant CLI entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit a81dc74.
Add Flex GEX pipeline support and release helper
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch adds end-to-end support for 10x Flex Gene Expression quantification, extends the chemistry registry/schema to describe Flex-specific resources, adds a release helper script for crate version bumps and publishing, and updates the documentation/tooling to cover the new command and build cleanly with a current Sphinx toolchain.
What changed
1. New
flex-quantcommand for 10x Flex GEXsimpleaf flex-quant, and wired it into command dispatch.src/simpleaf_commands/flex_quant.rs.probe_t2g.tsvgenerationpiscem buildpiscem map-scrnaalevin-fry generate-permit-listalevin-fry collatealevin-fry quantsimpleaf_flex_quant_info.json2. Chemistry schema and registry support for Flex protocols
src/utils/chem_utils.rswith new schema types used by Flex chemistries:OrganismProtocolTypeSampleBcListInfoProbeSetInfoCustomChemistrywith new optional Flex-related fields:sample_bc_listprobe_setsprotocol_type()is_flex_gex()remote_urlfield maps correctly to the chemistry permit-list URL field.None.resources/chemistries.json:10x-flexv1-gex-3p10x-flexv2-gex-3pflex_gex3. Flex- and geometry-related behavior fixes
src/utils/af_utils.rsto skipseq_geom_parservalidation for piscem-only extensions such ass[N]and numbered barcode tags likeb0[N], deferring validation to piscem where appropriate.piscem map-scrna.4. Release tooling
bump_and_publish.sh.Cargo.tomland thesimpleafpackage entry inCargo.lockcargo check/cargo package--dry-runand--publish5. Clippy / compile cleanup
Chemistry::Customenum variantDefaultforProtocolTypestrip_prefix6. Documentation updates
flex-quantcommand.quantdocs directing Flex users toflex-quant.docs/requirements.txtto installsphinxexplicitly instead of relying on whatever global version happens to be present.7. Documentation toolchain modernization
sphinx 9.1.0furo 2025.12.198. Other repository updates
.gitignore.flex-quant.Cargo.lockas0.20.0.Verification
cargo buildcargo clippy --all-targets --all-features -- -D warnings/tmp/simpleaf-docs-venv/bin/python -m sphinx -b html docs/source /tmp/simpleaf-docs-htmlNotes
Cargo.tomlandCargo.lockwere not included in this PR; this PR describes the committedmain..devdiff only.