${\color{black}PATT:\ {\color{red}P}roteome\ {\color{red}A}nnotation\ {\color{red}T}ransfer\ {\color{red}T}ool}$

Proteome Annotation Transfer Tool (PATT) is a powerful and versatile software tool for transferring annotations from a reference genome to an unannotated query genome. Developed using the Snakemake workflow management system, PATT provides a highly parallelized architecture and efficient approach to annotating new genomes, enabling researchers to rapidly and accurately annotate large-scale genomic data sets. PATT searches for the best protein ortholog of a close reference in a genome that we want to annotate, generating the best model of it and returning its coding and peptide sequence as well as its coordinates through .gff and .gbk annotation files. PATT is designed to simplify the process of annotating new genomes, streamlining your research process and delivering high-quality results.

Dependencies:

Snakemake (https://snakemake.readthedocs.io/en/stable/index.html)

Exonerate (https://www.ebi.ac.uk/about/vertebrate-genomics/software/exonerate) (v.2.4.0)

Blat (https://github.com/djhshih/blat)

Perl (https://www.perl.org/get.html) (v5.30.0)

AWK

Java

Parallel (https://manpages.ubuntu.com/manpages/impish/man1/parallel.1.html)

Perl Modules

Getopt::Long

Getopt::Std

Parallel::ForkManager

Installation:

Option 2

Make sure you have all dependencies installed. You also need to download and have in your path all the "bin" scripts.

To avoid errors with Java, you also need to create a variable with the absolute path of "readseq.jar" which is in the bin folder:

export CLASSPATH="/full/path/to/bin/readseq.jar"

You can check Snakemake on their site for more details of this.

Quick usage: (Install Option 2)

For genome.fasta and protein.faa file name run:

snakemake --cores -s /path/of/Snakefile

If genome or protein fastas files have other names, then run:

snakemake --cores <core_numbers> --config PROTREF="current_protein_fasta_filename" GENOME="current_genome_fasta_filename" -s path/of/Snakefile_PATT

More options

snakemake --cores <core_numbers> --rerun-incomplete --config PROTREF="protein.faa" GENOME="genome.fasta" PREFIX="prefix_outputfilename" NEWPREFIX="prefix_newgenenames_" -s path/of/Snakefile_PATT

About variables that PATT optionally needs:

GENOME= "genome.fasta" # Fasta file of genome that we want to annotate. Default: "genome.fasta"

PROTREF= "protein.faa" # Fasta file of the reference proteins that we want to transfer or annotate in our genome. Default: "protein.faa"

PREFIX= "prefix" # Output file prefix. Default: "mySpecies"

NEWPREFIX= "prefix_" # Prefix name we want for the proteins/transcripts in the our genome. We suggest ending in "" for aesthetics. Default: "{PREFIX}"

OLDPREFIX= "prefix" # Prefix that the proteins have in the faa to be transferred. Perl regular expressions are accepted ex: "^\S+gene[^_|\s]+". PATT generates new names of the transferred proteins keeping all(default) or a part of the original annotated protein identifier. Default: "=gene". ex: if the names of the proteins to be transferred have this form "tsol_\d+", my variable can be OLDPREFIX= "=genetsol_" and the new names will be "mySpecies_\d+"

Output files

The output of PATT produces 4 files:

File ".gff"

Annotation file in GFF format of the transferred proteins.

File ".gbk"

Annotation file in GenBank format of the transferred proteins.

File ".ffn"

Fasta file of all coding sequences (CDs).

File ".faa"

Fasta file of the peptide sequences.

Citation

Estrada, K. (2023). PATT (Proteome Annotation Transfer Tool) (Version 1) [Computer software]. https://doi.org/10.5281/zenodo.7958134

Acknowledgments

PATT wouldn't be the same without my fellow researchers at the UUSMB (Unidad Universitaria de Secuenciación Masiva y Bioinformática) Jerome Verleyen and Alejandro Sanchez, who helped me with ideas and challenges during PATT's development.

PATT uses Snakemake for pipeline development, Exonerate to perform alignments, Readseq for handling file formats, Mario Stanke script "gff2gbSmallDNA.pl" and many lines of code and scripts from my dear friend and god-level programmer, Alejandro Garciarrubio, I am grateful for his help and guidance.

Author

Karel Estrada

karel.estrada@ibt.unam.mx

Twitter: @kjestradag

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
bin		bin
example		example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
gitPATT.Rproj		gitPATT.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

${\color{black}PATT:\ {\color{red}P}roteome\ {\color{red}A}nnotation\ {\color{red}T}ransfer\ {\color{red}T}ool}$

Dependencies:

Installation:

Option 2

Quick usage: (Install Option 2)

More options

Output files

File ".gff"

File ".gbk"

File ".ffn"

File ".faa"

Citation

Acknowledgments

Author

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

${\color{black}PATT:\ {\color{red}P}roteome\ {\color{red}A}nnotation\ {\color{red}T}ransfer\ {\color{red}T}ool}$

Dependencies:

Installation:

Option 2

Quick usage: (Install Option 2)

More options

Output files

File ".gff"

File ".gbk"

File ".ffn"

File ".faa"

Citation

Acknowledgments

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages