-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
gnomAD v4 SV fields for annotation
I went over the gnomAD v4 SV header and selected a few fields that seem relevant for the annotation step. Here is an overview, we can discuss during the meeting next week.
- All info fields can be selected for the whole dataset, or for subsets (from gnomADv3), using these prefixes:
controls_and_biobanks_only samples collected specifically as controls for disease studies, or samples belonging to biobanks (e.g. BioMe, Genizon) or general population studies (e.g., 1000 Genomes, HGDP, PAGE)non_neuro_only samples that were not collected as part of a neurologic or psychiatric case/control study, or samples collected as part of a neurologic or psychiatric case/control study but designated as controls
- All info fields can be selected for the whole dataset, or for populations (from gnomADv3), using these prefixes:
afr_African/African Americanami_Amishamr_Latino/Admixed Americanasj_Ashkenazi Jewisheas_East Asianfin_Finnishnfe_Non-Finnish Europeanmid_Middle Easternsas_South Asianoth_Other (population not assigned)
for 1 and 2: I don't expect us to differentiate at this point, but I'm putting it out there for any future implementations.
- For annotation :
- use for vcfanno determination of identity
CHROM POS REF ALT
##ALT=<ID=CNV,Description="Copy Number Polymorphism">Seems like CNVs will have to determined on positions?!
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate">
##INFO=<ID=POS2,Number=1,Type=Integer,Description="Start position of the structural variant on CHR2">
##INFO=<ID=END2,Number=1,Type=Integer,Description="End position of the structural variant on CHR2">
-
IDrename tognomad4IDthe ID of the variant according to gnomADv4 -
FILTER
##FILTER=<ID=PASS,Description="All filters passed">maybe only annotate with SVs after filtering?, could add an annotation field saying it is present in gnomad4 but doesn't pass filtering? I made a table for the different type ofFILTERvalue
1199117 PASS
278316 UNRESOLVED
186815 LOWQUAL_WHAM_SR_DEL;OUTLIER_SAMPLE_ENRICHED
131479 LOWQUAL_WHAM_SR_DEL
109905 OUTLIER_SAMPLE_ENRICHED
82853 HIGH_NCR
79159 HIGH_NCR;UNRESOLVED
70291 HIGH_NCR;LOWQUAL_WHAM_SR_DEL
7280 IGH_MHC_OVERLAP;UNRESOLVED
5424 IGH_MHC_OVERLAP
1624 HIGH_NCR;IGH_MHC_OVERLAP;UNRESOLVED
882 IGH_MHC_OVERLAP;LOWQUAL_WHAM_SR_DEL
514 HIGH_NCR;IGH_MHC_OVERLAP
493 HIGH_NCR;IGH_MHC_OVERLAP;LOWQUAL_WHAM_SR_DEL
254 FAIL_MANUAL_REVIEW
57 REFERENCE_ARTIFACT
23 FAIL_MANUAL_REVIEW;HIGH_NCR
- INFO
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes">rename tognomad4AC
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele frequency (biallelic sites only).">rename tognomad4AF
##INFO=<ID=MALE_AF,Number=A,Type=Float,Description="MALE allele frequency (biallelic sites only).">rename tognomad4AF_MALE
##INFO=<ID=FEMALE_AF,Number=A,Type=Float,Description="FEMALE allele frequency (biallelic sites only).">rename tognomad4AF_FEMALE
##INFO=<ID=FREQ_HET,Number=1,Type=Float,Description="Heterozygous genotype frequency (biallelic sites only).">rename tognomad4HETF
##INFO=<ID=FREQ_HOMALT,Number=1,Type=Float,Description="Homozygous alternate genotype frequency (biallelic sites only).">rename tognomad4HOMF
##INFO=<ID=CN_NONREF_FREQ,Number=1,Type=Float,Description="Frequency of samples with non-reference copy states (multiallelic CNVs only).">rename tognomad4CNF
##INFO=<ID=CPX_INTERVALS,Number=.,Type=String,Description="Genomic intervals constituting complex variant.">rename tognomad4INT
##INFO=<ID=CPX_TYPE,Number=1,Type=String,Description="Class of complex variant.">rename tognomad4TYPE(the type of variant it is according to gnomADv4), use the same types as here
##CPX_TYPE_INS_iDEL="Insertion with deletion at insertion site."
##CPX_TYPE_INVdel="Complex inversion with 3' flanking deletion."
##CPX_TYPE_INVdup="Complex inversion with 3' flanking duplication."
##CPX_TYPE_dDUP="Dispersed duplication."
##CPX_TYPE_dDUP_iDEL="Dispersed duplication with deletion at insertion site."
##CPX_TYPE_delINV="Complex inversion with 5' flanking deletion."
##CPX_TYPE_delINVdel="Complex inversion with 5' and 3' flanking deletions."
##CPX_TYPE_delINVdup="Complex inversion with 5' flanking deletion and 3' flanking duplication."
##CPX_TYPE_dupINV="Complex inversion with 5' flanking duplication."
##CPX_TYPE_dupINVdel="Complex inversion with 5' flanking duplication and 3' flanking deletion."
##CPX_TYPE_dupINVdup="Complex inversion with 5' and 3' flanking duplications."
##CPX_TYPE_piDUP_FR="Palindromic inverted tandem duplication, forward-reverse orientation."
##CPX_TYPE_piDUP_RF="Palindromic inverted tandem duplication, reverse-forward orientation."
The following seem interesting to me, but should maybe be evaluated first for relevance and performance:
##INFO=<ID=PREDICTED_BREAKEND_EXONIC,Number=.,Type=String,Description="Gene(s) for which the SV breakend is predicted to fall in an exon.">
##INFO=<ID=PREDICTED_COPY_GAIN,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to have a copy-gain effect.">
##INFO=<ID=PREDICTED_DUP_PARTIAL,Number=.,Type=String,Description="Gene(s) which are partially overlapped by an SV's duplication, but the transcription start site is not duplicated.">
##INFO=<ID=PREDICTED_INTERGENIC,Number=0,Type=Flag,Description="SV does not overlap any protein-coding genes.">
##INFO=<ID=PREDICTED_INTRAGENIC_EXON_DUP,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to result in intragenic exonic duplication without breaking any coding sequences.">
##INFO=<ID=PREDICTED_INTRONIC,Number=.,Type=String,Description="Gene(s) where the SV was found to lie entirely within an intron.">
##INFO=<ID=PREDICTED_INV_SPAN,Number=.,Type=String,Description="Gene(s) which are entirely spanned by an SV's inversion.">
##INFO=<ID=PREDICTED_LOF,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to have a loss-of-function effect.">
##INFO=<ID=PREDICTED_MSV_EXON_OVERLAP,Number=.,Type=String,Description="Gene(s) on which the multiallelic SV would be predicted to have a LOF, INTRAGENIC_EXON_DUP, COPY_GAIN, DUP_PARTIAL, TSS_DUP, or PARTIAL_EXON_DUP annotation if the SV were biallelic.">
##INFO=<ID=PREDICTED_NEAREST_TSS,Number=.,Type=String,Description="Nearest transcription start site to an intergenic variant.">
##INFO=<ID=PREDICTED_PARTIAL_EXON_DUP,Number=.,Type=String,Description="Gene(s) where the duplication SV has one breakpoint in the coding sequence.">
##INFO=<ID=PREDICTED_PROMOTER,Number=.,Type=String,Description="Gene(s) for which the SV is predicted to overlap the promoter region.">
##INFO=<ID=PREDICTED_TSS_DUP,Number=.,Type=String,Description="Gene(s) for which the SV is predicted to duplicate the transcription start site.">
##INFO=<ID=PREDICTED_UTR,Number=.,Type=String,Description="Gene(s) for which the SV is predicted to disrupt a UTR.">
##INFO=<ID=SOURCE,Number=1,Type=String,Description="Source of inserted sequence.">
##INFO=<ID=STRANDS,Number=1,Type=String,Description="Breakpoint strandedness [++,+-,-+,--]">
To be continued...
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request