Error: reads file does not look like a FASTQ file

Dear Dr. Langmead,

I am trying to use Bowtie in a pipeline for small RNA-seq. I have been using it for months, but now, using the same command, it throws an error telling that the "read file does not look like a FASTQ file":

```
Time loading forward index: 00:00:09
Time loading mirror index: 00:00:09
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
```

The command is this one: 
```
bowtie -v 1  -M 1  --seed 666 --best --strata  --quiet  --threads 8 --chunkmbs 1024 --time --sam <refGenomeIndexPrefix> <fastq.gz>
```

The only difference between now and before is that now I am running this in a cluster using a singularity image and before I was running bowtie locally using conda. The previous steps in the pipeline are adapter trimming with `cutadapt` and quality filtering with `fastq_quality_filter`. 

I was looking at the FASTQ.gz files and they look normal:
```
@7001450:617:CD9F5ANXX:4:2309:3398:1995 1:N:0:TGACCA
TCTCAGNTTGTCATTTGGAGACTCCCCA
+
BBBCCE#>?FGGGGGGGGGGGGGGGGGG
@7001450:617:CD9F5ANXX:4:2309:3690:1999 1:N:0:TGACCA
TGAACGGAGAATAGAGTACATTGAAGCGA
+
CBBBBGGGGGGGGGGGGGEEGGGGGGGGC
```

I used three different approaches to validate them: 

- Looking at the sequence string and the quality string and counting the number of cases in those 2 strings are different in length (0 cases where the sequence and the quality strings are different). 
- Using `fastq_info` from [`fastq_utils`](https://github.com/nunofonseca/fastq_utils): 
```
fastq_utils 0.25.1
DEFAULT_HASHSIZE=39000001
Scanning and indexing all reads from results/01_fastq/caroli1.filt.fastq.gz
CASAVA=1.8
43600000Scanning complete.

Reads processed: 43600732
Memory used in indexing: ~3346 MB
------------------------------------
Number of reads: 43600732
Quality encoding range: 35 71
Quality encoding: 33
Read length: 19 36 30
OK
```

- Using `validatefastq` from [`biopet`](https://github.com/biopet/validatefastq):
```
INFO  [2022-02-07 17:18:52,605] [ValidateFastq$] - Start
INFO  [2022-02-07 17:18:52,969] [ValidateFastq$$anonfun$main$1] - 100000 reads processed
INFO  [2022-02-07 17:18:53,156] [ValidateFastq$$anonfun$main$1] - 200000 reads processed
...
...
INFO  [2022-02-07 17:20:16,953] [ValidateFastq$$anonfun$main$1] - 43600000 reads processed
INFO  [2022-02-07 17:20:16,955] [ValidateFastq$] - Possible quality encodings found: Sanger, Illumina 1.8+
INFO  [2022-02-07 17:20:16,955] [ValidateFastq$] - Done processing 43600732 fastq records, no errors found
INFO  [2022-02-07 17:20:16,956] [ValidateFastq$] - Done
```

Non of the approaches resulted in a "unvalid" FASTQ. 

Why can this happen? 

Thank you.

Best regards,
Adrià.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error: reads file does not look like a FASTQ file #129

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Error: reads file does not look like a FASTQ file #129

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions