-
Notifications
You must be signed in to change notification settings - Fork 134
Description
I ran an older version of minicaller (version 6d7e78c) over a bam file and it output a VCF file with a selected multi-allelic variant shown below:
cusRef 24 . G A,C,T . . AC=1,1,1;AF=0.250,0.250,0.250;AN=4;DP=164410 GT:DP:DP4:DPG 0/3/2/1:164410:161365,259,2785,1:161624,1356,977,452
From the DPG field above, I calculated the alt allele frequency (AF) this way:
Alt allele T AF = 1356/(161624+1356+977+452)
Alt allele C AF = 977/(161624+1356+977+452)
Alt allele A AF = 452/(161624+1356+977+452)
I then ran the latest version of minicaller (version 69ca18e) over the same bam file. (I was able to resolve the "too many open files" error message by increasing the value in maxRecordsInRam. Thanks so much!)
In order to obtain all the variants, I turned off two filters by setting:
--bad-ad-ratio 1
In this case, it satisfies 1< ALT/(REF+ALT) < 0 so that no genotypes will be filtered--gt-fraction 0
It satisfies ALT/(REF+ALT) < 0 so again no genotypes will be ignored.
In addition I set--min-gt-allele-depth 10and--min-gt-depth 10.
I then looked into the variant in the same position, and here's the variant detected by the new version:
cusRef 24 . G T 38 . AC=1;AF=0.500;AN=2;DP=324605 GT:AD:DP:FT:GQ 0/1:161624,1356:162981:LowQual:38
My questions are:
(1) In the variant from the new version of minicaller, there is only one variant (G->T), whereas there are 3 variants (G->A,C,T) from the old version. Looks like the new version just selected the alt allele with the highest read counts. Can you please explain why?
(2) In the variant from new version, is it okay for me to calculate the alt allele frequency using the AD field this way?
Alt allele T AF = 1356/(161624+1356)
Our environment
- the latest version of jvarkit (69ca18e)
- openJDK Java Version 22
- REHL9
I apologize for a long post, but thank you so much for your attention!
Best,
Ting