feat(sort): add CB (cellular barcode) to template-coordinate sort key#160
feat(sort): add CB (cellular barcode) to template-coordinate sort key#160
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughAdds cellular-barcode (CB) support across template-coordinate sorting. compare_template_coordinate_raw now accepts 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/sort/inline_buffer.rs (1)
384-390:⚠️ Potential issue | 🟠 MajorUnmapped template keys now discard CB in CB-aware mode.
Line 388 hardcodes
cb_hash: 0inTemplateKey::unmapped, so fully unmapped records are no longer cell-separated when--cell-tagis active.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/sort/inline_buffer.rs` around lines 384 - 390, The unmapped constructor currently zeroes cb_hash which causes unmapped records to lose cell-boundary tagging in CB-aware mode; update TemplateKey::unmapped to set cb_hash to the unmapped sentinel (e.g., u64::MAX) instead of 0 so cell-tag logic can detect an unmapped CB. Modify the unmapped function (the constructor that sets primary, secondary, cb_hash, tertiary, name_hash_upper) to assign cb_hash = u64::MAX.
🧹 Nitpick comments (3)
src/lib/sort/keys.rs (1)
1214-1227: Parameterize these CB comparison cases withrstest.This block is a clear case matrix and is easier to maintain as parameterized tests.
As per coding guidelines "Use rstest for parameterized tests".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/sort/keys.rs` around lines 1214 - 1227, Replace the three individual CB comparison assertions with a single parameterized rstest that supplies the CB strings and expected ordering; create a test function (e.g., test_cbd_ordering or test_cid_comparisons) annotated with #[rstest] and include cases for ("A","AA", Ordering::Less), ("A","B", Ordering::Less), and ("","AAAA", Ordering::Less), constructing keys via key(...) inside the test and asserting the comparison matches the expected result; ensure you import rstest and std::cmp::Ordering and keep the original key(...) argument pattern to build operands.src/commands/sort.rs (1)
298-304: Extract shared cell-tag parsing into a helper.The same parse/convert block appears twice; a small helper would reduce drift.
Also applies to: 376-382
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/commands/sort.rs` around lines 298 - 304, Extract the repeated parse/convert block into a small helper like parse_cell_tag_option(&self) -> Result<Option<[u8;2]>, Error> that checks if self.order == SortOrderArg::TemplateCoordinate, calls string_to_tag(&self.cell_tag, "cell-tag")?, and returns Some([tag.as_ref()[0], tag.as_ref()[1]]) or None otherwise; replace the duplicate code blocks that currently compute cell_tag (the snippet using string_to_tag and creating Some([..]) and the similar block at the other location) with calls to this helper so both sites reuse the same logic and error handling.crates/fgumi-raw-bam/src/sort.rs (1)
493-557: Userstestfor these CB permutation tests.These cases are table-style variants of one scenario and are better expressed as parameterized tests.
As per coding guidelines "Use rstest for parameterized tests".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/fgumi-raw-bam/src/sort.rs` around lines 493 - 557, Replace the three near-duplicate unit tests with a single parameterized rstest that iterates the CB permutations and expected Ordering values; locate the existing tests test_compare_template_coordinate_raw_cb_tag_tiebreak, test_compare_template_coordinate_raw_cb_mixed_presence, and test_compare_template_coordinate_raw_cb_length_first and consolidate them into one #[rstest] function that takes parameters (aux_a: Vec<u8>, aux_b: Vec<u8>, expected: Ordering) and inside builds a and b using make_bam_bytes and asserts compare_template_coordinate_raw(&a, &b, Some(b"CB")) == expected (also include the case where cell_tag is None for the tiebreak test as a separate param or separate assertion); add use rstest::rstest at top and ensure Ordering is in scope.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/commands/dedup.rs`:
- Line 1042: The guidance strings in src/commands/dedup.rs hardcode the cell tag
("CB") instead of using the configured --cell-tag value; update the user-facing
messages to interpolate the actual cell-tag variable (e.g., cell_tag or
opts.cell_tag) when building the help/guidance strings so the example command
uses the real flag value; apply the same change to all occurrences referenced
(around the strings at lines ~1042, ~1191-1192, ~1402-1403) so messages
consistently show the configured --cell-tag rather than "CB".
In `@src/commands/group.rs`:
- Around line 928-929: The remediation message hardcodes "--cell-tag CB"; update
the string construction in src/commands/group.rs (where the grouping error/help
message is built, e.g., inside the group command handler function) to
interpolate the actual cell-tag value variable (commonly named cell_tag,
args.cell_tag, or obtained via matches.value_of("cell-tag")) instead of the
literal "CB", so the suggested fgumi command uses the user-provided cell tag;
ensure the string formatting/escaping is correct when inserting that variable.
In `@src/lib/sort/inline_buffer.rs`:
- Around line 305-307: The template-key packing in inline_buffer.rs produces a
different field order than TemplateCoordinateKey (causing inconsistent sorting):
adjust the packing in the tertiary key construction (the block that builds
`tertiary`) so its field order matches TemplateCoordinateKey’s ordering (cid ->
mid -> name -> library_idx) instead of placing `library` before `MI`; update the
same ordering logic in the other affected sections (around the tertiary
construction and the similar packs at lines referenced 369-374 and 452-460) so
all sort paths encode keys identically.
In `@src/lib/sort/raw.rs`:
- Around line 1779-1788: The code reduces the cell barcode (CB) to a 64-bit hash
(cb_hash) using ahash::RandomState::with_seeds and state.hash_one (triggered
from cell_tag and find_string_tag_in_record), which breaks the required ordering
(should be by CB length then lexicographic) and can create collisions; change
the implementation so the sort key retains the CB length and raw bytes instead
of a numeric hash (e.g., produce a tuple key (cb_len, cb_bytes) or otherwise
emit length then lexicographic value) and update any comparisons that use
cb_hash to compare first by length then by lexicographic byte order; remove use
of ahash::RandomState::with_seeds/state.hash_one for CB ordering to preserve
correct template-coordinate ordering semantics.
---
Outside diff comments:
In `@src/lib/sort/inline_buffer.rs`:
- Around line 384-390: The unmapped constructor currently zeroes cb_hash which
causes unmapped records to lose cell-boundary tagging in CB-aware mode; update
TemplateKey::unmapped to set cb_hash to the unmapped sentinel (e.g., u64::MAX)
instead of 0 so cell-tag logic can detect an unmapped CB. Modify the unmapped
function (the constructor that sets primary, secondary, cb_hash, tertiary,
name_hash_upper) to assign cb_hash = u64::MAX.
---
Nitpick comments:
In `@crates/fgumi-raw-bam/src/sort.rs`:
- Around line 493-557: Replace the three near-duplicate unit tests with a single
parameterized rstest that iterates the CB permutations and expected Ordering
values; locate the existing tests
test_compare_template_coordinate_raw_cb_tag_tiebreak,
test_compare_template_coordinate_raw_cb_mixed_presence, and
test_compare_template_coordinate_raw_cb_length_first and consolidate them into
one #[rstest] function that takes parameters (aux_a: Vec<u8>, aux_b: Vec<u8>,
expected: Ordering) and inside builds a and b using make_bam_bytes and asserts
compare_template_coordinate_raw(&a, &b, Some(b"CB")) == expected (also include
the case where cell_tag is None for the tiebreak test as a separate param or
separate assertion); add use rstest::rstest at top and ensure Ordering is in
scope.
In `@src/commands/sort.rs`:
- Around line 298-304: Extract the repeated parse/convert block into a small
helper like parse_cell_tag_option(&self) -> Result<Option<[u8;2]>, Error> that
checks if self.order == SortOrderArg::TemplateCoordinate, calls
string_to_tag(&self.cell_tag, "cell-tag")?, and returns Some([tag.as_ref()[0],
tag.as_ref()[1]]) or None otherwise; replace the duplicate code blocks that
currently compute cell_tag (the snippet using string_to_tag and creating
Some([..]) and the similar block at the other location) with calls to this
helper so both sites reuse the same logic and error handling.
In `@src/lib/sort/keys.rs`:
- Around line 1214-1227: Replace the three individual CB comparison assertions
with a single parameterized rstest that supplies the CB strings and expected
ordering; create a test function (e.g., test_cbd_ordering or
test_cid_comparisons) annotated with #[rstest] and include cases for ("A","AA",
Ordering::Less), ("A","B", Ordering::Less), and ("","AAAA", Ordering::Less),
constructing keys via key(...) inside the test and asserting the comparison
matches the expected result; ensure you import rstest and std::cmp::Ordering and
keep the original key(...) argument pattern to build operands.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
crates/fgumi-raw-bam/src/sort.rssrc/commands/dedup.rssrc/commands/group.rssrc/commands/sort.rssrc/lib/sort/inline_buffer.rssrc/lib/sort/keys.rssrc/lib/sort/raw.rs
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #160 +/- ##
==========================================
+ Coverage 84.09% 84.28% +0.19%
==========================================
Files 128 128
Lines 51771 51978 +207
==========================================
+ Hits 43538 43812 +274
+ Misses 8233 8166 -67 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0c14864 to
79dc8b2
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/lib/sort/inline_buffer.rs (1)
369-373:⚠️ Potential issue | 🔴 CriticalTemplateKey still orders
librarybeforeMI/name, violating the declared template-coordinate order.Because
tertiarypackslibraryin higher bits and is compared beforename_hash_upper, the effective order is... -> CB -> library -> MI -> name. That disagrees with the intended... -> CB -> MI -> name -> library -> is_upper.Also applies to: 452-460
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/sort/inline_buffer.rs` around lines 369 - 373, The tertiary packing in inline_buffer.rs currently places library in the highest bits (p3), causing library to be compared before MI/name; fix the pack order so MI and name_hash_upper occupy the higher bits and library comes after them (with is_upper as the lowest bit). Specifically update the p3 construction (and the analogous pack at the other occurrence around the 452-460 region) to shift and combine mi.0 and the name hash upper bits into the top bits, then include library next, and finally the is_upper bit (from !mi.1) as the least significant bit so the effective comparison order becomes CB -> MI -> name -> library -> is_upper.src/lib/sort/raw.rs (1)
1779-1788:⚠️ Potential issue | 🔴 CriticalCB hashing breaks required template-coordinate ordering semantics.
cb_hashcomparison is not equivalent to CB length-first + lexicographic ordering and can collide. It also makes missing CB (0) differ from empty CB (hash("")), which is inconsistent with the raw comparator semantics.Please carry a collision-free CB ordering representation (length + bytes, or equivalent) through
TemplateKeyinstead of a hash-only key.
🧹 Nitpick comments (1)
src/commands/sort.rs (1)
276-285: Add focused tests forparse_cell_tag.This new branch/parsing logic should have direct coverage (template-coordinate valid/invalid tag, non-template returns
None).Proposed test additions
#[cfg(test)] mod tests { use super::*; + use rstest::rstest; + + #[rstest] + #[case(SortOrderArg::TemplateCoordinate, "CB", Some(*b"CB"), true)] + #[case(SortOrderArg::TemplateCoordinate, "CR", Some(*b"CR"), true)] + #[case(SortOrderArg::Coordinate, "CB", None, true)] + #[case(SortOrderArg::Queryname, "CB", None, true)] + #[case(SortOrderArg::TemplateCoordinate, "C", None, false)] + fn test_parse_cell_tag( + #[case] order: SortOrderArg, + #[case] cell_tag: &str, + #[case] expected: Option<[u8; 2]>, + #[case] ok: bool, + ) { + let cmd = Sort { + input: PathBuf::from("in.bam"), + output: Some(PathBuf::from("out.bam")), + verify: false, + order, + max_memory: 768 * 1024 * 1024, + memory_per_thread: true, + tmp_dir: None, + threads: 1, + compression: CompressionOptions::default(), + temp_compression: 1, + write_index: false, + cell_tag: cell_tag.to_string(), + }; + + let got = cmd.parse_cell_tag(); + if ok { + assert_eq!(got.unwrap(), expected); + } else { + assert!(got.is_err()); + } + }As per coding guidelines, "Use rstest for parameterized tests".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/commands/sort.rs` around lines 276 - 285, Add rstest-based unit tests that directly exercise the parse_cell_tag method: (1) when self.order == SortOrderArg::TemplateCoordinate and self.cell_tag is a valid two-char tag (use string_to_tag expectations) assert Ok(Some([u8;2])) with correct bytes; (2) when self.order == SortOrderArg::TemplateCoordinate and self.cell_tag is invalid assert Err from string_to_tag; and (3) when self.order != SortOrderArg::TemplateCoordinate assert Ok(None). Reference the parse_cell_tag method, SortOrderArg::TemplateCoordinate variant, the cell_tag field, and the string_to_tag helper when writing test cases and use rstest parameterization to cover valid/invalid/non-template cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/lib/sort/inline_buffer.rs`:
- Around line 369-373: The tertiary packing in inline_buffer.rs currently places
library in the highest bits (p3), causing library to be compared before MI/name;
fix the pack order so MI and name_hash_upper occupy the higher bits and library
comes after them (with is_upper as the lowest bit). Specifically update the p3
construction (and the analogous pack at the other occurrence around the 452-460
region) to shift and combine mi.0 and the name hash upper bits into the top
bits, then include library next, and finally the is_upper bit (from !mi.1) as
the least significant bit so the effective comparison order becomes CB -> MI ->
name -> library -> is_upper.
---
Nitpick comments:
In `@src/commands/sort.rs`:
- Around line 276-285: Add rstest-based unit tests that directly exercise the
parse_cell_tag method: (1) when self.order == SortOrderArg::TemplateCoordinate
and self.cell_tag is a valid two-char tag (use string_to_tag expectations)
assert Ok(Some([u8;2])) with correct bytes; (2) when self.order ==
SortOrderArg::TemplateCoordinate and self.cell_tag is invalid assert Err from
string_to_tag; and (3) when self.order != SortOrderArg::TemplateCoordinate
assert Ok(None). Reference the parse_cell_tag method,
SortOrderArg::TemplateCoordinate variant, the cell_tag field, and the
string_to_tag helper when writing test cases and use rstest parameterization to
cover valid/invalid/non-template cases.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7a9fd425-6a9f-4730-bdcc-f18e22754fc8
📒 Files selected for processing (7)
crates/fgumi-raw-bam/src/sort.rssrc/commands/dedup.rssrc/commands/group.rssrc/commands/sort.rssrc/lib/sort/inline_buffer.rssrc/lib/sort/keys.rssrc/lib/sort/raw.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/commands/dedup.rs
79dc8b2 to
f8fa67b
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (3)
src/commands/dedup.rs (1)
1042-1042:⚠️ Potential issue | 🟠 MajorReplace hardcoded
CBin help text with a generic/matching cell-tag instruction.Line 1042 still hardcodes
--cell-tag CB, which conflicts with configurable--cell-tagand can produce incorrect operator guidance.Suggested fix
-- Must be sorted with `fgumi sort --order template-coordinate --cell-tag CB` +- Must be sorted with `fgumi sort --order template-coordinate` using the matching `--cell-tag`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/commands/dedup.rs` at line 1042, The help text currently hardcodes "CB" in the string "Must be sorted with `fgumi sort --order template-coordinate --cell-tag CB`"; update the help/usage string in dedup.rs (where the `--cell-tag` argument/flag help is defined) to avoid a hardcoded tag by either showing a placeholder like `--cell-tag <TAG>`/`--cell-tag <CELL_TAG>` or by interpolating the actual `--cell-tag` value used by the command; locate the literal help string and replace "CB" with the placeholder or dynamic reference so the message matches the configurable `--cell-tag` flag.src/lib/sort/raw.rs (1)
1758-1760:⚠️ Potential issue | 🔴 CriticalCB hashing here breaks the required CB sort semantics.
The required order is CB length-first then lexicographic. Hashing at Line 1779 loses that order and introduces collision risk, so template-coordinate output can diverge from spec and from other CB-aware comparison paths.
Please store a sortable CB key (length + bytes) rather than a hash in the template key path.
Also applies to: 1778-1789
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/sort/raw.rs` around lines 1758 - 1760, The template key currently hashes the CB value when cell_tag is Some (the CB hashing between neg2 and MI), which breaks required length-first then lexicographic CB ordering and risks collisions; instead, change the template key construction to append a sortable CB key: first the CB length (as a fixed-width byte or varint consistent with other keys) followed by the raw CB bytes (no hashing). Update the code paths that build the template key where cell_tag is used (the block inserting the hashed CB between neg2 and MI) to write length+bytes into the same buffer/field so downstream comparisons use length-first then lexicographic order and remove the hash usage.src/lib/sort/inline_buffer.rs (1)
305-307:⚠️ Potential issue | 🟠 MajorTemplateKey ordering still diverges from the intended template-coordinate field order.
At Line 371,
libraryis packed ahead ofMI, andtertiaryis compared beforename_hash_upper(Line 452+), producingCB -> library -> MI -> name. The intended order isCB -> MI -> name -> library. This can make sort output disagree across key paths.Also applies to: 369-373, 452-460, 465-476
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/sort/inline_buffer.rs` around lines 305 - 307, The TemplateKey packing/comparison in inline_buffer.rs currently places `library` before `MI` and compares `tertiary` (tertiary/`is_upper`/name_hash_upper order) before `name_hash_upper`, resulting in CB -> library -> MI -> name; update the packing and comparison logic for the TemplateKey so the fields are emitted and compared in the intended order: CB -> MI -> name -> library. Specifically, in the code that constructs/packs the u64 key words for TemplateKey (the packing block that builds the five u64 values) reorder the fields so `MI` is packed prior to `library` and the `name_hash_upper` components are placed/compared before the `tertiary`/`is_upper` component; likewise update the comparison path where those u64 words are compared (the TemplateKey comparison routine) to follow CB, MI, name, library sequence so sorting is consistent across key paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/commands/dedup.rs`:
- Line 1042: The help text currently hardcodes "CB" in the string "Must be
sorted with `fgumi sort --order template-coordinate --cell-tag CB`"; update the
help/usage string in dedup.rs (where the `--cell-tag` argument/flag help is
defined) to avoid a hardcoded tag by either showing a placeholder like
`--cell-tag <TAG>`/`--cell-tag <CELL_TAG>` or by interpolating the actual
`--cell-tag` value used by the command; locate the literal help string and
replace "CB" with the placeholder or dynamic reference so the message matches
the configurable `--cell-tag` flag.
In `@src/lib/sort/inline_buffer.rs`:
- Around line 305-307: The TemplateKey packing/comparison in inline_buffer.rs
currently places `library` before `MI` and compares `tertiary`
(tertiary/`is_upper`/name_hash_upper order) before `name_hash_upper`, resulting
in CB -> library -> MI -> name; update the packing and comparison logic for the
TemplateKey so the fields are emitted and compared in the intended order: CB ->
MI -> name -> library. Specifically, in the code that constructs/packs the u64
key words for TemplateKey (the packing block that builds the five u64 values)
reorder the fields so `MI` is packed prior to `library` and the
`name_hash_upper` components are placed/compared before the
`tertiary`/`is_upper` component; likewise update the comparison path where those
u64 words are compared (the TemplateKey comparison routine) to follow CB, MI,
name, library sequence so sorting is consistent across key paths.
In `@src/lib/sort/raw.rs`:
- Around line 1758-1760: The template key currently hashes the CB value when
cell_tag is Some (the CB hashing between neg2 and MI), which breaks required
length-first then lexicographic CB ordering and risks collisions; instead,
change the template key construction to append a sortable CB key: first the CB
length (as a fixed-width byte or varint consistent with other keys) followed by
the raw CB bytes (no hashing). Update the code paths that build the template key
where cell_tag is used (the block inserting the hashed CB between neg2 and MI)
to write length+bytes into the same buffer/field so downstream comparisons use
length-first then lexicographic order and remove the hash usage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 27d67f7d-93ef-4450-a311-fd3bd1f248ca
📒 Files selected for processing (7)
crates/fgumi-raw-bam/src/sort.rssrc/commands/dedup.rssrc/commands/group.rssrc/commands/sort.rssrc/lib/sort/inline_buffer.rssrc/lib/sort/keys.rssrc/lib/sort/raw.rs
Port fgbio PR #1142 to include the CB tag in the template-coordinate sort order for single-cell data. Without CB in the sort, templates from different cells at the same locus are interleaved, breaking grouper adjacency assumptions. Comparison order is now: tid1 → tid2 → pos1 → pos2 → neg1 → neg2 → CB → MI → name → library → is_upper Changes: - Add cb_hash (u64) to TemplateKey between secondary and tertiary, expanding packed key from 32 to 40 bytes and inline header from 40 to 48 bytes - Add cid (String) to TemplateCoordinateKey with length-first then lexicographic comparison matching fgbio - Add cell_tag parameter to compare_template_coordinate_raw() for raw byte comparison path - Add cell_tag field and builder method to RawExternalSorter - Add --cell-tag / -c CLI argument to sort command (default "CB") - Update group and dedup help text to mention --cell-tag in sort suggestions
f8fa67b to
ea9adec
Compare
Summary
Ports fgbio PR #1142 to include the CB (cellular barcode) tag in the template-coordinate sort order for single-cell data. Without CB in the sort, templates from different cells at the same locus are interleaved, breaking grouper adjacency assumptions.
The new comparison order is:
tid1 → tid2 → pos1 → pos2 → neg1 → neg2 → **CB** → MI → name → library → is_upperChanges
inline_buffer.rs: Addcb_hash: u64toTemplateKey(5th packed field), expanding serialized key from 32→40 bytes and inline header from 40→48 bytes. Updated radix sort,Ord,core_cmp, and serialization.keys.rs: Addcid: StringtoTemplateCoordinateKeywith length-first then lexicographic comparison (matching fgbio). Inserted after neg2, before MI inOrdimpl.raw.rs: Addcell_tag: Option<[u8; 2]>toRawExternalSorterwith builder method.extract_template_key_inlinehashes CB tag with ahash (distinct fixed seeds for determinism).sort.rs(fgumi-raw-bam crate): Addcell_tagparameter tocompare_template_coordinate_raw()with CB comparison after neg2, before MI.Noneskips CB (backwards compatible).sort.rs(command): Add--cell-tag/-cCLI arg (default"CB"), threaded to sorter and verify mode.group.rs/dedup.rs: Updated help text and error messages to mention--cell-tag CBin sort suggestions.Tests
TemplateCoordinateKeyregression test (length-first, lexicographic, empty vs non-empty)TemplateKeytests updated for newcb_hashparameterTest plan
cargo ci-test— all 1852 tests passcargo ci-fmt— cleancargo ci-lint— clean