Skip to content

feat: consider a cell barcode when sorting by template coordinate#1142

Merged
clintval merged 6 commits intomainfrom
cv_cell_barcode_sort
Mar 6, 2026
Merged

feat: consider a cell barcode when sorting by template coordinate#1142
clintval merged 6 commits intomainfrom
cv_cell_barcode_sort

Conversation

@clintval
Copy link
Member

@clintval clintval commented Mar 3, 2026

For single-cell technologies, a cell barcode may be present on SAM records and should be used as a grouping key during --template-coordinate sort. This ensures templates at the same locus are grouped together successfully when those templates are known to come from different cells.

Companion PR: samtools/samtools#2314

@nh13 would you be willing to ensure this is added to fgumi too or would you welcome me to make that issue/PR?

@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Warning

Rate limit exceeded

@clintval has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 22 minutes and 9 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 027fe6c and c99818a.

📒 Files selected for processing (2)
  • src/main/scala/com/fulcrumgenomics/bam/SortBam.scala
  • src/main/scala/com/fulcrumgenomics/bam/api/SamOrder.scala
📝 Walkthrough

Walkthrough

SortBam.scala: updated documentation to describe a new sort-key order that includes the cellular barcode (CB tag); no logic changes. SamOrder.scala: added cid (CB) to TemplateCoordinateKey signature and fields, passed cid into sort-key construction, and extended comparison to compare cid.length then cid. Tests: added a SamOrderTest that verifies grouping by CB with secondary ordering by MI; the test block appears duplicated.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately describes the main change: adding cell barcode consideration to template coordinate sorting.
Description check ✅ Passed The description relates directly to the changeset, explaining the motivation and context for using cell barcodes during template coordinate sorting.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cv_cell_barcode_sort

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/test/scala/com/fulcrumgenomics/bam/api/SamOrderTest.scala (1)

213-213: Minor duplication: seq helper.

This helper duplicates line 181. Consider extracting to a shared scope if test file grows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/test/scala/com/fulcrumgenomics/bam/api/SamOrderTest.scala` at line 213,
The helper method seq is duplicated (another definition exists at line 181);
remove this duplicate definition and extract the single seq(n: Int, str:
String): Seq[String] implementation into a shared scope used by the tests (e.g.,
a top-level helper in the test file, a companion object, or a private val in the
test class) so both usages reference the same method; update callers to use that
single seq function and delete the redundant definition.
src/main/scala/com/fulcrumgenomics/bam/api/SamOrder.scala (1)

212-227: Documentation/comparison order mismatch.

Documentation (lines 167-169) says order is: "library, cellular barcode, MI, read name". But compare() orders: cidmidnamelibrary.

Library comparison happens last (line 224), after name. This works but could confuse future maintainers. Consider either reordering the comparison or updating the docs to match actual behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/scala/com/fulcrumgenomics/bam/api/SamOrder.scala` around lines 212 -
227, The compare method in TemplateCoordinateKey currently compares cid, mid,
name, then library (in compare), which mismatches the documented order
("library, cellular barcode, MI, read name"); update the compare(...)
implementation so the library field (this.library.compareTo(that.library)) is
compared before cid, mid, and name (i.e., perform library, then cid.length/cid,
then mid.length/mid, then name), or alternatively update the documentation to
reflect the current compare ordering—ensure you modify the
TemplateCoordinateKey.compare method if choosing code change so the comparison
sequence matches the docs exactly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/main/scala/com/fulcrumgenomics/bam/api/SamOrder.scala`:
- Around line 212-227: The compare method in TemplateCoordinateKey currently
compares cid, mid, name, then library (in compare), which mismatches the
documented order ("library, cellular barcode, MI, read name"); update the
compare(...) implementation so the library field
(this.library.compareTo(that.library)) is compared before cid, mid, and name
(i.e., perform library, then cid.length/cid, then mid.length/mid, then name), or
alternatively update the documentation to reflect the current compare
ordering—ensure you modify the TemplateCoordinateKey.compare method if choosing
code change so the comparison sequence matches the docs exactly.

In `@src/test/scala/com/fulcrumgenomics/bam/api/SamOrderTest.scala`:
- Line 213: The helper method seq is duplicated (another definition exists at
line 181); remove this duplicate definition and extract the single seq(n: Int,
str: String): Seq[String] implementation into a shared scope used by the tests
(e.g., a top-level helper in the test file, a companion object, or a private val
in the test class) so both usages reference the same method; update callers to
use that single seq function and delete the redundant definition.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8a1d91 and 69cb925.

📒 Files selected for processing (3)
  • src/main/scala/com/fulcrumgenomics/bam/SortBam.scala
  • src/main/scala/com/fulcrumgenomics/bam/api/SamOrder.scala
  • src/test/scala/com/fulcrumgenomics/bam/api/SamOrderTest.scala

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

PR Preview Action v1.6.1

🚀 View preview at
https://fulcrumgenomics.github.io/fgbio/pr-preview/pr-1142/

Built to branch gh-pages at 2026-03-03 23:06 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.97%. Comparing base (4bcaf2f) to head (c99818a).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1142   +/-   ##
=======================================
  Coverage   95.96%   95.97%           
=======================================
  Files         132      132           
  Lines        8037     8040    +3     
  Branches      516      562   +46     
=======================================
+ Hits         7713     7716    +3     
  Misses        324      324           
Flag Coverage Δ
unittests 95.97% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strictly a breaking change since it will change the sort order when someone has the CB tag in a BAM. Also, should we wait until the samtools PR is merged and released so the recommendation to use samtools sort --template-coordinate is valid?

@clintval
Copy link
Member Author

clintval commented Mar 6, 2026

I don't regard this as a breaking change since the tool's contract never specified anything about cell barcodes. The output of our template coordinate sort will still be backwards compatible with ecosystem tooling. When the CB tag is not present, the tool will behave exactly the same as it did before. When the tags are present, templates that would have otherwise been grouped across cells will now be grouped by cell which could be considered an an enhancement for correctness-sake.

The samtools PR was merged so I think we should be OK to merge this PR as well:

@clintval clintval requested a review from nh13 March 6, 2026 16:50
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank-you!

@clintval clintval merged commit f7474b4 into main Mar 6, 2026
17 checks passed
@clintval clintval deleted the cv_cell_barcode_sort branch March 6, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants