feat: consider a cell barcode when sorting by template coordinate#1142
feat: consider a cell barcode when sorting by template coordinate#1142
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughSortBam.scala: updated documentation to describe a new sort-key order that includes the cellular barcode (CB tag); no logic changes. SamOrder.scala: added 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/test/scala/com/fulcrumgenomics/bam/api/SamOrderTest.scala (1)
213-213: Minor duplication:seqhelper.This helper duplicates line 181. Consider extracting to a shared scope if test file grows.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/test/scala/com/fulcrumgenomics/bam/api/SamOrderTest.scala` at line 213, The helper method seq is duplicated (another definition exists at line 181); remove this duplicate definition and extract the single seq(n: Int, str: String): Seq[String] implementation into a shared scope used by the tests (e.g., a top-level helper in the test file, a companion object, or a private val in the test class) so both usages reference the same method; update callers to use that single seq function and delete the redundant definition.src/main/scala/com/fulcrumgenomics/bam/api/SamOrder.scala (1)
212-227: Documentation/comparison order mismatch.Documentation (lines 167-169) says order is: "library, cellular barcode, MI, read name". But
compare()orders:cid→mid→name→library.Library comparison happens last (line 224), after name. This works but could confuse future maintainers. Consider either reordering the comparison or updating the docs to match actual behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/scala/com/fulcrumgenomics/bam/api/SamOrder.scala` around lines 212 - 227, The compare method in TemplateCoordinateKey currently compares cid, mid, name, then library (in compare), which mismatches the documented order ("library, cellular barcode, MI, read name"); update the compare(...) implementation so the library field (this.library.compareTo(that.library)) is compared before cid, mid, and name (i.e., perform library, then cid.length/cid, then mid.length/mid, then name), or alternatively update the documentation to reflect the current compare ordering—ensure you modify the TemplateCoordinateKey.compare method if choosing code change so the comparison sequence matches the docs exactly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/main/scala/com/fulcrumgenomics/bam/api/SamOrder.scala`:
- Around line 212-227: The compare method in TemplateCoordinateKey currently
compares cid, mid, name, then library (in compare), which mismatches the
documented order ("library, cellular barcode, MI, read name"); update the
compare(...) implementation so the library field
(this.library.compareTo(that.library)) is compared before cid, mid, and name
(i.e., perform library, then cid.length/cid, then mid.length/mid, then name), or
alternatively update the documentation to reflect the current compare
ordering—ensure you modify the TemplateCoordinateKey.compare method if choosing
code change so the comparison sequence matches the docs exactly.
In `@src/test/scala/com/fulcrumgenomics/bam/api/SamOrderTest.scala`:
- Line 213: The helper method seq is duplicated (another definition exists at
line 181); remove this duplicate definition and extract the single seq(n: Int,
str: String): Seq[String] implementation into a shared scope used by the tests
(e.g., a top-level helper in the test file, a companion object, or a private val
in the test class) so both usages reference the same method; update callers to
use that single seq function and delete the redundant definition.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/main/scala/com/fulcrumgenomics/bam/SortBam.scalasrc/main/scala/com/fulcrumgenomics/bam/api/SamOrder.scalasrc/test/scala/com/fulcrumgenomics/bam/api/SamOrderTest.scala
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1142 +/- ##
=======================================
Coverage 95.96% 95.97%
=======================================
Files 132 132
Lines 8037 8040 +3
Branches 516 562 +46
=======================================
+ Hits 7713 7716 +3
Misses 324 324
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
nh13
left a comment
There was a problem hiding this comment.
This is strictly a breaking change since it will change the sort order when someone has the CB tag in a BAM. Also, should we wait until the samtools PR is merged and released so the recommendation to use samtools sort --template-coordinate is valid?
|
I don't regard this as a breaking change since the tool's contract never specified anything about cell barcodes. The output of our template coordinate sort will still be backwards compatible with ecosystem tooling. When the The samtools PR was merged so I think we should be OK to merge this PR as well: |
For single-cell technologies, a cell barcode may be present on SAM records and should be used as a grouping key during
--template-coordinatesort. This ensures templates at the same locus are grouped together successfully when those templates are known to come from different cells.Companion PR: samtools/samtools#2314
@nh13 would you be willing to ensure this is added to
fgumitoo or would you welcome me to make that issue/PR?