Pannagram is a package for constructing pan-genome alignments, analyzing structural variants, and translating annotations between genomes. A project consists of several Bash pipelines for streamline genomic analisys and an R library with tools for further analysis and visualization.
Documentation can be found at pannagram-page.
After cloning the repo follow these instructions to set up your pannagram environment.
It is expected, that you have one of the following package managers installed on your machine:
Below package manager of your choice is refered to as <manager>.
<manager> env create -f pannagram.yml
<manager> activate pannagram<manager> env create --platform osx-64 -f pannagram_m4.yml
<manager> activate pannagramIf you want to resolve package dependencies by yourself use pannagram_min.yml where only direct dependencies are specified with no explicit versions. Packages will be thus installed with the latest compatible versions available:
Linux and macOS (Intel)
<manager> env create -f pannagram_min.yml
<manager> activate pannagrammacOS (M-series chips)
<manager> env create --platform osx-64 -f pannagram_min.yml
<manager> activate pannagramMake sure that RStudio-Desktop is installed. Then run the following in the command line:
<manager> activate pannagram
open -a RStudioOne may also create an alias:
alias panR="<manager> activate pannagram && open -a RStudio"We encourage you to run RStudio from activated package environment, but if RStudio server is already running and you want to install pannagram there as an R package run in console:
setwd("<path to pannagram repo>")
source("install_in_rstudio.R")Pannagram package environment includes the following dependencies and they are accessible directly via the command line:
Windows users may use WSL following steps described for Linux users but be warned, that we have never tested pannagram in such environment.
An extended description of the parameters for all scripts are avaliable by executing scripts with the flag -help.
-
Preliminary mode helps you give a quick look at your genomes when you start your research:
pannagram -pre \ -path_in '<directory with your genomes as FASTA files>' \ -path_out '<directory to put the results in (will be created)>' \ -ref '<reference genome filename with no FASTA suffix>' \ -cores 8Here
-refargument is some name from-path_indirectory with FASTA suffix ommited. If path to your reference genome is different provide it using-path_ref. Each subsequent run with different-refwill create separate subdirectory inside-path_in. -
Reference-based mode runs full alignmnet pipeline for aligning all genomes to the given genome :
pannagram \ -path_in '<directory with your genomes as FASTA files>' \ -path_out '<directory to put the results in (will be created)>' \ -ref '<reference genome filename with no FASTA suffix>' \ -cores 8Here the pannagram expects your genomes to have the same number of chromosomes. If it is not the case, specify
-nchrinteger parameter to truncate analisys to specific number of chromosomes. Read about-refand-path_refparameters above. -
Reference-free (MSA) mode runs full alignmnet pipeline for aligning all genomes to the given genome:
pannagram \ -path_in '<directory with your genomes as FASTA files>' \ -path_out '<directory to put the results in (will be created)>' \ -cores 8Once again specify
-nchrinteger parameter if needed.
To facilitate correct execution of other pannagram scripts and functions do not alter output subdirectories content!
After running pannagram pipeline you are able to get more features of your data! Pay attention, some of flags are independent from each other, others need to be passed together:
-
Extract information from the pangenome alignment:
features -path_in '<same as -path_out of pannagram>' \ -blocks \ # Find Synteny block inforamtion for visualisation -seq \ # Create consensus sequence of the pangenome -snp \ # SNP calling -cores 8
-
Structural variants calling. When the pangenome linear alignment is built, SVs can be called using the following command:
features -path_in '<same as -path_out of pannagram>' \ -sv_call \ # Create output .gff and .fasta files with SVs -sv_sim te.fasta \ # Compare with a set of sequences (e.g., TEs) -sv_graph \ # Construct the graph of SVs -cores 8
-
...in set of sequences This approach is designed to search for similarities against another set of sequences.
simsearch \ -in_seq genes.fasta \ -on_seq genome.fasta \ -sim 90 \ -out "<out path>" -
...in the genome This approach involves searching against entire genomes or individual chromosomes:
simsearch \ -in_seq genes.fasta -on_genome genome.fasta \ -out "<out path>"The result is a GFF file with hits matching the similarity threshold.
-
...on all genomes in directory Here we learch in all genomes in given directory:
simsearch \ -in_seq genes.fasta \ -on_path "<path to genomes>" \ -out "<out path>"
-
In your R session call the library:
library(pannagram) -
Extract the part of the alignment within given window (will work after if
featureswas called with-seqflag):path.project = "<same path as for -path_out of pannagram>" aln.seq <- cutAln(path.proj=path.project, i.chr=1, p.beg=1, p.end=20000, acc="<single accession from your genomes>")
-
Build and save alignment window plots:
p.nucl <- msaplot(aln.seq) p.diff <- msadiff(aln.seq)
-
And save them as pdf:
savePDF(p.nucl, path="<specify desired output path>", name="msa_nucl", width=7, height=5) savePDF(p.diff, path="<specify desired output path>", name="msa_diff", width=7, height=5)
-
You'll get pictures similar to:


For more examples and detailed documentation visit Docuentation Pages.
Development:
- Anna Igolkina - Lead Developer and Project Initiator
- Alexander Bezlepsky - Assistant
Testing:
- Anna Igolkina: Lead Tester
- Anna Glushkevich: Testing the alignment on A. lyrata genomes
- Elizaveta Grigoreva: Testing the alignment on A. thaliana and A. lyrata genomes
- Jilong Ma: Testing the SV-graph on spider genomes
- Alexander Bezlepsky: Testing the Pannagram's functionality on Rhizobial genomes
- Gregoire Bohl-Viallefond: Testing the annotation converter on A. thaliana alignment
Resources:
- Logo was generated with the help of DALL-E
- Parallel Processing Tool: O. Tange (2018): GNU Parallel 2018, ISBN 9781387509881, DOI https://doi.org/10.5281/zenodo.1146014.
