Skip to content

Error: reads file does not look like a FASTQ file #129

@amitjavilaventura

Description

@amitjavilaventura

Dear Dr. Langmead,

I am trying to use Bowtie in a pipeline for small RNA-seq. I have been using it for months, but now, using the same command, it throws an error telling that the "read file does not look like a FASTQ file":

Time loading forward index: 00:00:09
Time loading mirror index: 00:00:09
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'

The command is this one:

bowtie -v 1  -M 1  --seed 666 --best --strata  --quiet  --threads 8 --chunkmbs 1024 --time --sam <refGenomeIndexPrefix> <fastq.gz>

The only difference between now and before is that now I am running this in a cluster using a singularity image and before I was running bowtie locally using conda. The previous steps in the pipeline are adapter trimming with cutadapt and quality filtering with fastq_quality_filter.

I was looking at the FASTQ.gz files and they look normal:

@7001450:617:CD9F5ANXX:4:2309:3398:1995 1:N:0:TGACCA
TCTCAGNTTGTCATTTGGAGACTCCCCA
+
BBBCCE#>?FGGGGGGGGGGGGGGGGGG
@7001450:617:CD9F5ANXX:4:2309:3690:1999 1:N:0:TGACCA
TGAACGGAGAATAGAGTACATTGAAGCGA
+
CBBBBGGGGGGGGGGGGGEEGGGGGGGGC

I used three different approaches to validate them:

  • Looking at the sequence string and the quality string and counting the number of cases in those 2 strings are different in length (0 cases where the sequence and the quality strings are different).
  • Using fastq_info from fastq_utils:
fastq_utils 0.25.1
DEFAULT_HASHSIZE=39000001
Scanning and indexing all reads from results/01_fastq/caroli1.filt.fastq.gz
CASAVA=1.8
43600000Scanning complete.

Reads processed: 43600732
Memory used in indexing: ~3346 MB
------------------------------------
Number of reads: 43600732
Quality encoding range: 35 71
Quality encoding: 33
Read length: 19 36 30
OK
  • Using validatefastq from biopet:
INFO  [2022-02-07 17:18:52,605] [ValidateFastq$] - Start
INFO  [2022-02-07 17:18:52,969] [ValidateFastq$$anonfun$main$1] - 100000 reads processed
INFO  [2022-02-07 17:18:53,156] [ValidateFastq$$anonfun$main$1] - 200000 reads processed
...
...
INFO  [2022-02-07 17:20:16,953] [ValidateFastq$$anonfun$main$1] - 43600000 reads processed
INFO  [2022-02-07 17:20:16,955] [ValidateFastq$] - Possible quality encodings found: Sanger, Illumina 1.8+
INFO  [2022-02-07 17:20:16,955] [ValidateFastq$] - Done processing 43600732 fastq records, no errors found
INFO  [2022-02-07 17:20:16,956] [ValidateFastq$] - Done

Non of the approaches resulted in a "unvalid" FASTQ.

Why can this happen?

Thank you.

Best regards,
Adrià.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions