Skip to content

Primer Filter

Xian Chang edited this page Feb 18, 2025 · 1 revision

vg primers can be used to filter PRC primers based on properties of the pangenome such as whether there are variations in the primers and the possible lengths of the PRC product. vg primers takes pangenome indexes and the output of primer3 as input and outputs a .tsv file of the input primers and properties from the pangenome.

Get primers with primer3

The input to vg primers is the output of the command line version of primer3.

primer3 requires a config file formatted like:

SEQUENCE_ID=CHM13#0#chr17|BRCA1P1|exon_1|44026826
SEQUENCE_TEMPLATE=CATGT...
PRIMER_NUM_RETURN=10
PRIMER_TASK=generic
PRIMER_PICK_LEFT_PRIMER=1
PRIMER_PICK_INTERNAL_OLIGO=0
PRIMER_PICK_RIGHT_PRIMER=1
PRIMER_OPT_SIZE=20
PRIMER_MIN_SIZE=18
PRIMER_MAX_SIZE=22
PRIMER_PRODUCT_SIZE_RANGE=75-150
PRIMER_EXPLAIN_FLAG=1
=

The SEQUENCE_ID field must be formatted correctly for vg primers to find the correct location of the primers in the pangenome. There are four fields in SEQUENCE_ID separated by |. They are the reference path name in the graph, the name of the gene, the exon or intron, and the offset of the sequence in the path. The names of the reference paths in the graph can be found with vg paths -L -R. The SEQUENCE_TEMPLATE is the nucleotide sequence the primers are found from.

Filtering primers with vg primers

vg primers requires the following indexes of the pangenome:

  • the xg index created with vg index -x
  • the distance index created with vg index -j
  • the r-index created with vg gbwt -r
  • the gbz created with vg gbwt

If the SEQUENCE_ID field is not formatted correctly and the reference coordinates cannot be found, then vg primers will try to find the reference coordinates by mapping the template sequence with vg giraffe then projecting onto the reference with vg surject. This process is not recommended because it is not guaranteed that a good alignment will be found. If the sequence needs to be mapped, then vg primers additionally requires the following files:

  • the minimizer index created with vg minimizer. We recommend that this is run with -k 31 -w 50 -W and -z to create the zipcode file
  • the zipcode file, also created with vg minimizer

Interpreting output of vg primers

vg primers outputs a tsv file with the following fields for each primer:

field definition description
chrom chromosome reference path name, the first field in SEQUENCE_ID
tplfeat template feature the second and third fields in SEQUENCE_ID
tplpos template position offset along the reference path, the fourth field in SEQUENCE_ID
lpseq left primer sequence the nucleotide sequence of the left primer
rpseq right primer sequence the nucleotide sequence of the right primer
lppostpl left primer position template the offset of the left primer in the template sequence
rppostmp right primer position template the offset of the right primer in the template sequence
lpposchrom left primer position chromosome the offset of the left primer in the reference path
rpposchrom right primer position chromosome the offset of the right primer in the reference path
pnid left primer mapped node ids the node ids that the left primer overlaps in the graph
rpnid right primer mapped node ids the node ids that the right primer overlaps in the graph
lplen left primer length the length in nucleotides of the left primer
rplen right primer length the length in nucleotides of the right primer
linsize linear product size the length of the product in the linear genome (including primer lengths)
minsize minimum product size the minimum length of the product according to the pangenome
maxsize maximum product size the maximum length of the product according to the pangenome
varlevel variation level a measure of variation in the primers (the number of unique haplotypes / the total number of haplotypes)
Clone this wiki locally