Wire up CRAM 3.1 codecs for reading.#1736
Conversation
773b235 to
c6c7ce5
Compare
fd4e834 to
68c3b8a
Compare
f012878 to
d89abba
Compare
1a5dc3a to
79792f2
Compare
840fc9d to
c8ba753
Compare
lbergelson
left a comment
There was a problem hiding this comment.
@cmnbroad Some comments. One potential bug and a request to rename the new build commands so it's clear they're for a special test build.
Dockerfile
Outdated
| @@ -0,0 +1,13 @@ | |||
| # This docker build is here to enable Carrot tests | |||
There was a problem hiding this comment.
I don't know if there's a way to have carrot build a non-default docker image, but if that's possible I would call this something like CarrotDockerFile so people don't naively try to use it for non-carrot things.
There was a problem hiding this comment.
Commit removed, to be revived on the write branch.
build.gradle
Outdated
| } | ||
|
|
||
| // A shadow jar that includes all dependencies PLUS test code, for use with carrot tests | ||
| shadowJar { |
There was a problem hiding this comment.
Lets make separate carrotShadowJar task instead of using the default. Again, people might just run shadowjar and assume it's the standard htsjdk thing.
| * the newest version, since it has no write implementation yet. | ||
| */ | ||
| @Override | ||
| protected List<ReadsCodec> filterByVersion(final List<ReadsCodec> candidateCodecs, final HtsVersion htsVersion) { |
There was a problem hiding this comment.
Not for this PR, but I wonder if we should make this something that exists generically.
| return buffer.array(); | ||
| } | ||
|
|
||
| final byte[] bytes = new byte[buffer.remaining()]; |
There was a problem hiding this comment.
I probably am misunderstanding, but doesn't remaining return the length between the current position and the end? Isn't what you want the length from arrayOffset() to limit()?
There was a problem hiding this comment.
Oh yeah, very good catch. Fixed.
| @@ -57,6 +64,9 @@ public ExternalCompressor getCompressorForMethod( | |||
| final int compressorSpecificArg) { | |||
| switch (compressionMethod) { | |||
There was a problem hiding this comment.
This could be a switch expression which lets you drop the default condition. It's fine though.
| sharedRANSNx16Decode = new RANSNx16Decode(); | ||
| } | ||
| compressorCache.put( | ||
| new Tuple(BlockCompressionMethod.RANSNx16, ransArg), |
There was a problem hiding this comment.
This might be the sort of place where a private CompressorConfig record type could clarify things instead of the awkward tuples.
| * on the corresponding reference contig. | ||
| */ | ||
| public class Slice { | ||
| private final CRAMVersion cramVersion; |
There was a problem hiding this comment.
Ooh, can we have crams that vary version by slice?!
There was a problem hiding this comment.
Let us hope not. Its just for convenience - there have always been cases where slice needs to discriminate by version when reading a slice, but for some reason I thought I needed it when writing now too - although so far there are no references to it.
| // reads file 1 (bam or cram) | ||
| // reads file 2 (bam or cram) | ||
| // reference | ||
| public static void main(final String[] args) { |
There was a problem hiding this comment.
This feels awkwardly placed but doing it differently seems like more of a hassle. I think you've come up with a pretty good solution.
There was a problem hiding this comment.
I'm actually reverting the carrot changes in this branch (reverting the one commit) since its not complete/working anyway. I'll revive it for the write branch though.
lbergelson
left a comment
There was a problem hiding this comment.
@cmnbroad Some comments. One potential bug and a request to rename the new build commands so it's clear they're for a special test build.
|
The release notes for the recent 4.2.0 release note that it includes CRAM 3.1 codecs due to #1714. However it appears that, without this PR's addition to CramVersions.java in particular, the recent release remains unable to read CRAM v3.1 files. |
|
@jmarshall Ah, sorry for the confusion. It's literally just the codecs in that PR. The wiring isn't in yet. It's coming soon. |
|
No confusion 😄 I just wanted to make sure that this PR hadn't been inadvertently omitted from the release or anything similar. Particularly in the light of samtools/htslib#1900 and the related release notes for the (imminent?) upcoming htslib/samtools releases — cf samtools/htslib#1898 first paragraph of the addition. |
c8ba753 to
3362a10
Compare
This reverts commit c8ba753.
3362a10 to
144f5f8
Compare
lbergelson
left a comment
There was a problem hiding this comment.
This looks good to me. I'm merging!
A few notes:
FASTACodecV1_0changes are to fix an issue that came up when I added the (beta api) CRAM 3.1 codec and a corresponding test; the codec did not properly handle gzipped fasta files.NameTokenisationEncoderandEncodeTokenclass changes are to fix an issue discovered on the separate CRAM 3.1 writer branch.