-
Notifications
You must be signed in to change notification settings - Fork 80
Open
Description
Dear Dr. Langmead,
I am trying to use Bowtie in a pipeline for small RNA-seq. I have been using it for months, but now, using the same command, it throws an error telling that the "read file does not look like a FASTQ file":
Time loading forward index: 00:00:09
Time loading mirror index: 00:00:09
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
The command is this one:
bowtie -v 1 -M 1 --seed 666 --best --strata --quiet --threads 8 --chunkmbs 1024 --time --sam <refGenomeIndexPrefix> <fastq.gz>
The only difference between now and before is that now I am running this in a cluster using a singularity image and before I was running bowtie locally using conda. The previous steps in the pipeline are adapter trimming with cutadapt
and quality filtering with fastq_quality_filter
.
I was looking at the FASTQ.gz files and they look normal:
@7001450:617:CD9F5ANXX:4:2309:3398:1995 1:N:0:TGACCA
TCTCAGNTTGTCATTTGGAGACTCCCCA
+
BBBCCE#>?FGGGGGGGGGGGGGGGGGG
@7001450:617:CD9F5ANXX:4:2309:3690:1999 1:N:0:TGACCA
TGAACGGAGAATAGAGTACATTGAAGCGA
+
CBBBBGGGGGGGGGGGGGEEGGGGGGGGC
I used three different approaches to validate them:
- Looking at the sequence string and the quality string and counting the number of cases in those 2 strings are different in length (0 cases where the sequence and the quality strings are different).
- Using
fastq_info
fromfastq_utils
:
fastq_utils 0.25.1
DEFAULT_HASHSIZE=39000001
Scanning and indexing all reads from results/01_fastq/caroli1.filt.fastq.gz
CASAVA=1.8
43600000Scanning complete.
Reads processed: 43600732
Memory used in indexing: ~3346 MB
------------------------------------
Number of reads: 43600732
Quality encoding range: 35 71
Quality encoding: 33
Read length: 19 36 30
OK
- Using
validatefastq
frombiopet
:
INFO [2022-02-07 17:18:52,605] [ValidateFastq$] - Start
INFO [2022-02-07 17:18:52,969] [ValidateFastq$$anonfun$main$1] - 100000 reads processed
INFO [2022-02-07 17:18:53,156] [ValidateFastq$$anonfun$main$1] - 200000 reads processed
...
...
INFO [2022-02-07 17:20:16,953] [ValidateFastq$$anonfun$main$1] - 43600000 reads processed
INFO [2022-02-07 17:20:16,955] [ValidateFastq$] - Possible quality encodings found: Sanger, Illumina 1.8+
INFO [2022-02-07 17:20:16,955] [ValidateFastq$] - Done processing 43600732 fastq records, no errors found
INFO [2022-02-07 17:20:16,956] [ValidateFastq$] - Done
Non of the approaches resulted in a "unvalid" FASTQ.
Why can this happen?
Thank you.
Best regards,
Adrià.
Metadata
Metadata
Assignees
Labels
No labels