-
Notifications
You must be signed in to change notification settings - Fork 315
Open
Description
adam-core version: 0.33.0
Spark version: 3.3.0
Scala version: 2.12
I read FASTQ BGZ file with following code :
spark.sparkContext.newAPIHadoopFile(url, classOf[SingleFastqInputFormat], classOf[Void], classOf[Text], conf)
It works fine if the file is about 70 GB.
However when file size is about 170 GB, some reads are missing (the missing reads are well-formed).
And the missing reads can be found if read the file line by line
spark.sparkContext.newAPIHadoopFile(url, classOf[TextInputFormat], classOf[Void], classOf[Text], conf)
Is there any limitation about SingleFastqInputFormat or any advice can help me debug this issue ?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels