Skip to content

iganna/pannagram

Repository files navigation

Pannagram

Pannagram is a package for constructing pan-genome alignments, analyzing structural variants, and translating annotations between genomes. A project consists of several Bash pipelines for streamline genomic analisys and an R library with tools for further analysis and visualization.

Documentation can be found at pannagram-page.

Setting Up the Working Environment

After cloning the repo follow these instructions to set up your pannagram environment.

Prerequisites

It is expected, that you have one of the following package managers installed on your machine:

Below package manager of your choice is refered to as <manager>.

Linux

<manager> env create -f pannagram.yml
<manager> activate pannagram

macOS (M-series chips)

<manager> env create --platform osx-64 -f pannagram_m4.yml
<manager> activate pannagram

Alternative: Setting Up the Environment Without Explicit Versions

If you want to resolve package dependencies by yourself use pannagram_min.yml where only direct dependencies are specified with no explicit versions. Packages will be thus installed with the latest compatible versions available:

Linux and macOS (Intel)

<manager> env create -f pannagram_min.yml
<manager> activate pannagram

macOS (M-series chips)

<manager> env create --platform osx-64 -f pannagram_min.yml
<manager> activate pannagram

Running RStudio with the Environment

Make sure that RStudio-Desktop is installed. Then run the following in the command line:

<manager> activate pannagram
open -a RStudio

One may also create an alias:

alias panR="<manager> activate pannagram && open -a RStudio"

Installing Pannagram in Already Running RStudio Environment

We encourage you to run RStudio from activated package environment, but if RStudio server is already running and you want to install pannagram there as an R package run in console:

setwd("<path to pannagram repo>")
source("install_in_rstudio.R")

Included Dependencies

Pannagram package environment includes the following dependencies and they are accessible directly via the command line:

For Windows Users

Windows users may use WSL following steps described for Linux users but be warned, that we have never tested pannagram in such environment.

Pannagram as a suite of bash pipelines

An extended description of the parameters for all scripts are avaliable by executing scripts with the flag -help.

1. Pangenome linear alignment with pannagram

  • Preliminary mode helps you give a quick look at your genomes when you start your research:

    pannagram -pre \
        -path_in '<directory with your genomes as FASTA files>' \
        -path_out '<directory to put the results in (will be created)>' \
        -ref '<reference genome filename with no FASTA suffix>' \
        -cores 8

    Here -ref argument is some name from -path_in directory with FASTA suffix ommited. If path to your reference genome is different provide it using -path_ref. Each subsequent run with different -ref will create separate subdirectory inside -path_in.

  • Reference-based mode runs full alignmnet pipeline for aligning all genomes to the given genome :

    pannagram \
        -path_in '<directory with your genomes as FASTA files>' \
        -path_out '<directory to put the results in (will be created)>' \
        -ref '<reference genome filename with no FASTA suffix>' \
        -cores 8

    Here the pannagram expects your genomes to have the same number of chromosomes. If it is not the case, specify -nchr integer parameter to truncate analisys to specific number of chromosomes. Read about -ref and -path_ref parameters above.

  • Reference-free (MSA) mode runs full alignmnet pipeline for aligning all genomes to the given genome:

    pannagram \
        -path_in '<directory with your genomes as FASTA files>' \
        -path_out '<directory to put the results in (will be created)>' \
        -cores 8

    Once again specify -nchr integer parameter if needed.

To facilitate correct execution of other pannagram scripts and functions do not alter output subdirectories content!

2. Feature extraction with features

After running pannagram pipeline you are able to get more features of your data! Pay attention, some of flags are independent from each other, others need to be passed together:

  • Extract information from the pangenome alignment:

    features -path_in '<same as -path_out of pannagram>' \
        -blocks  \  # Find Synteny block inforamtion for visualisation
        -seq  \     # Create consensus sequence of the pangenome
        -snp \      # SNP calling
        -cores 8
  • Structural variants calling. When the pangenome linear alignment is built, SVs can be called using the following command:

    features -path_in '<same as -path_out of pannagram>' \
        -sv_call  \         # Create output .gff and .fasta files with SVs
        -sv_sim te.fasta \  # Compare with a set of sequences (e.g., TEs)
        -sv_graph  \        # Construct the graph of SVs
        -cores 8

3. Search for similar sequences with simsearch

  • ...in set of sequences This approach is designed to search for similarities against another set of sequences.

    simsearch \
        -in_seq genes.fasta \
        -on_seq genome.fasta \
        -sim 90 \
        -out "<out path>"
  • ...in the genome This approach involves searching against entire genomes or individual chromosomes:

    simsearch \
        -in_seq genes.fasta
        -on_genome genome.fasta \
        -out "<out path>"

    The result is a GFF file with hits matching the similarity threshold.

  • ...on all genomes in directory Here we learch in all genomes in given directory:

    simsearch \
        -in_seq genes.fasta \
        -on_path "<path to genomes>" \
        -out "<out path>"

Pannagram as an R package

  1. In your R session call the library:

    library(pannagram)
  2. Extract the part of the alignment within given window (will work after if features was called with -seq flag):

    path.project = "<same path as for -path_out of pannagram>"
    aln.seq <- cutAln(path.proj=path.project,
                      i.chr=1,
                      p.beg=1,
                      p.end=20000,
                      acc="<single accession from your genomes>")
  3. Build and save alignment window plots:

    p.nucl <- msaplot(aln.seq)
    p.diff <- msadiff(aln.seq)
  4. And save them as pdf:

    savePDF(p.nucl,
            path="<specify desired output path>",
            name="msa_nucl",
            width=7,
            height=5)
    savePDF(p.diff,
            path="<specify desired output path>",
            name="msa_diff",
            width=7,
            height=5)
  5. You'll get pictures similar to:

    For more examples and detailed documentation visit Docuentation Pages.

Acknowledgements

Development:

  • Anna Igolkina - Lead Developer and Project Initiator
  • Alexander Bezlepsky - Assistant

Testing:

  • Anna Igolkina: Lead Tester
  • Anna Glushkevich: Testing the alignment on A. lyrata genomes
  • Elizaveta Grigoreva: Testing the alignment on A. thaliana and A. lyrata genomes
  • Jilong Ma: Testing the SV-graph on spider genomes
  • Alexander Bezlepsky: Testing the Pannagram's functionality on Rhizobial genomes
  • Gregoire Bohl-Viallefond: Testing the annotation converter on A. thaliana alignment

Resources:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •