-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alignment of FASTQ files for HG002 #34
Comments
Thank you very much for the information, I'll try out the newer draft for the time being, and I'll experiment with aligning the Novaseq fastq files for now. All of this is very greatly appreciated! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I apologise in advance if this is a redundant or foolish question, I am still relatively new to fastq to bam alignment but would seriously appreciate any guidance on the following questions for SV caller benchmarking using HG002 v0.6 as my truth set. For evaluating the callers I was originally going to use
Under: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/NIST_HiSeq_HG002_Homogeneity-10953946/HG002_HiSeq300x_fastq/140528_D00360_0018_AH8VC6ADXX/Project_RM8391_RM8392/
I see that there are multiple Samples: 2A1, 2A2, 2F1, 2F2, etc., is each individual fastq within these sub-directories 30x coverage? Or do they add up to 30x per sub-directory?
I'm looking to create a bam file of about 30x coverage that's not biased by read group or library, how would you recommend going about this? Would it be better to merge all R1 files together and R2 files together and then down-sample each one to 30x post-merging? Or is there a better approach?
In addition to this, how would one carry out the down-sampling?
Once again, I apologise if these questions are redundant but any help would be greatly appreciated.
Thank you for your time and patience.
The text was updated successfully, but these errors were encountered: