Collate fastq file before splitting #331
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It was reported to me that the _R1/_R2 from
samtools fastq
were not collated properly, that a single read was appearing in two wildly different places in R1/R2 which is completely silly.I have tried to reproduce this but thus far have been unable to with a small subset
(Note the complete lack of difference between ordering.)
But if we look at the output of files which have come out of this tool, there are clear differences:
these were produced by the command
This is indeed documented behaviour however:
So it makes some sense to collate, or at some point ensure that the BAMs are sorted.
I think there is a discussion to be had over whether automatic collation in sensible or a waste of runtime, but on the other hand, this is maybe a small footgun and eliminating it would make sense to reduce the potential failure modes (give our focus on reducing risk and all.)
Checklist
parameter_meta
was added/updated (if required).