BioSynthNexus

Overview

A graphical user interface to efficiently access and extract data involved with genome neighborhood networks.

General Functions:

Filters and extracts data from a Genome Neighborhood Network (GNN) sqlite file, generated by the Genome Neighborhood Tool (GNT) from Enzyme Function Initiative (EFI) Tools.^1,2

Retrieves UniProt data for given protein accession ID(s).³

Running, Installation, Packaging

Running the Program Directly from an Executable (no dependecies needed)

Pre-packaged executables are provided in the latest release.

Download the file for your operating system and extract the executable to your preferred location, and double click to start.

Manual Installation / Packaging

An environment can be created with pipenv or conda using the provided Pipfile or requirements.txt. You must have pipenv or conda/miniconda/etc installed to utilize these options, which may require additional steps.

Pipenv

Open a terminal in directory containing the repository
Run pipenv install
- All required dependencies should then be installed
The program can then be run with pipenv run python main.py

Conda

Open a terminal in directory containing the repository
Create a conda environment conda create -n env_name python=3.10 pip
Activate environment conda activate env_name
Install Requirements pip install -r requirements.txt
Run the program python main.py

Packaging into an application (optional)

If you would like to package your modified code into a single executable, pyinstaller is included in the dev-packages of the Pipfile

Open a terminal in repository directory
Run pipenv install -d
- Note: If you are using a conda environment, you have to manually install pyinstaller pip install pyinstaller
Run pyinstaller --windowed --onefile --add-data=ui_main_window.ui:./ --add-data=custom_ui_theme.xml:./ main.py
- A Folder named dist will contain the packaged application

Input / Output Options:

UniProt Requests:

Output Type	Input	Description
FASTA	Accession ID(s)	Gets the FASTA formatted protein sequence(s)
Genome Accession ID in GenBank	Accession ID(s)	Gets the GenBank Genome Accession ID(s)
Protein Accession ID in GenBank	Accession ID(s)	Gets the GenBank Protein Accession ID(s)
ORF Name in Corresponding Genome	Accession ID(s)	Gets the GenBank Open Reading Frame (ORF) ID(s)

Genome Neighborhood Network Requests:

Output Tyoe	Input	Description
Parent Accession ID	Pfam ID(s)	Gets Parent Accession IDs that correspond to neighborhoods that contain the given Pfam ID(s)
Genome Neighborhood ID	Pfam ID(s)	Gets the Genome Neighborhood ID(s) for those that contain the given Pfam ID(s)
Genome Neighborhood Pfams	Genome Neighborhood ID	Gets the Pfams for each of the proteins within a single BGC
Genome Neighborhood Accessions	Genome Neighborhood ID	Gets the Accession IDs for each of the proteins within a single BGC
Neighboring Gene Accessions by Pfam	Genome Neighborhood ID(s) + Single Pfam (Secondary Input)	Gets the Accession IDs for the protein within each BGC with the selected Pfam Displays as 'Output Accession_(BGC ID)'
Genome Neighborhood Pfam Comparison	Pfam ID(s) Optional: Pfam in Secondary Input	Searches all Genome Neighborhoods for each given Pfam ID, Displays as 'Genome Neighborhood ID_(number of matching Pfams)' The `Secondary Input` is an optional pre-filter for neighborhoods that only contain genes from that Pfam ID

Notes:

The Neighboring Gene Accessions by Pfam outputs as Gene Accession_(Genome Neighborhood ID), however if you use Replace Input with Output, only the Output Accession will be displayed.
If the input field is empty, genome neighborhoods that contain unannotated proteins (Pfam = none) will be considered.

GNN Information

General construction

A sequence similarity network is first constructed from a query gene sequence using EFI-EST^1,2 and subsequently processed by EFI-GNT^1,2 to generate a genome neighborhood network (GNN). The GNN can be visualized using the genome neighborhood diagram (EFI-GND) available on EFI website. For more information on GNN construction, please vist Enzyme Function Initiative (EFI) Tools.^1,2

Uploading a GNN to BioSynthNexus:

Click the Upload button to select your GNN *.sqlite file
Change the Request Type to Genome Neighborhood Network
Select your desired output (See Geneome Neighborhood Requests above for guidance)
Fill the left text field, labeled 'Input', with respective input
Click Search
Output will be displayed in the right text field, labeled 'Output'
The text in the Output Box can be used as an Input by utilizing the Replace Input with Output Button

Usage:

The GNN consists of a list of genome neighborhoods, each containing a parent gene (a homolog of the initial query) and its neighboring genes.

Representative biosynthetic gene clusters (BGCs) used for the following demonstrations.

![Figure 1](/images/Figure S1.jpg) Examples and step-by-step instructions are detailed below. In all examples, Gene 1 was used as a query to generate a sequence similarity network (SSN) and a subsequent genome neighborhood network (GNN) via the EFI-EST and EFI-GNT websites^1,2, respectively. A list of genome neighborhoods (GNs) can be visualized on EFI-GNT website. Genes from the same protein family (Pfam) are represented in the same color, while genes not belonging to PF00001–PF00005 are shown in gray for clarity.

Retrieval of gene information from the UniProt database according to the selected the output type.

![Figure 2](/images/Figure S2.jpg) Figure 2: In this example, the FASTA format of sequences is displayed as results, which can be used for further multiple sequence alignment.

Note: Other output types, such as Protein Accession ID in GenBank, Genome Accession ID in GenBank, and ORF nNames in the Corresponding Genome, can be selected for different purposes. This feature does not require uploading a GNN sqlite file.

Retrieval of genome neighborhoods containing neighboring gene(s) within the specific Pfam ID(s).

![Figure 3](/images/Figure S3.jpg) Figure 3: In this example, GN1–GN3 are displayed as output results because these genome neighborhoods include a neighboring gene from PF00002 (input) (Figure S1). This strategy was employed in this study. Note: The accession IDs of Gene 1 from GN1–GN3 will be as displayed as results if “Parent Accession ID” is selected as the output type.

Retrieval of accession IDs for neighboring genes within a designated Pfam ID from the given genome neighborhoods.

![Figure 4](/images/Figure S4.jpg) Figure 4: In this example, the accession IDs of Gene 2 (secondary input) from GN1–GN3 (input) are displayed as results, with the origin of each Gene 2 specified in parentheses indicating the corresponding genome neighborhood ID. Note: After filtering the Pfam ID of interest in Figure 3, clicking “REPLACE INPUT WITH OUTPUT” button and following the instructions above enables efficient retrieval of the accession IDs of the targeted neighboring genes.

Retrieval of all Pfam IDs for neighboring genes from the given genome neighborhoods.

![Figure 5](/images/Figure S5.jpg) Figure 5: In this example, PF00002–PF00005 are displayed as results because these neighboring genes are within GN1 (input). Note: The accession IDs of GN1 neighboring genes will be displayed as results if “Genome Neighborhood Accession” is selected as the output type. The number of retrieved neighboring genes depends on the neighborhood size specified when generating the GNN sqlite file from the EFI-GNT website.

Retrieval of genome neighborhood information from the given Pfam ID(s).

![Figure 6](/images/Figure S6.jpg) Figure 6: In this example, all genome neighborhoods are displayed as results, with the number of matching Pfam IDs in parentheses. Moreover, a CSV file can be retrieved to show which Pfam IDs match in each genome neighborhood. Note: The secondary input is an optional choice to pre-filter the Pfam IDs. If “PF00002” is applied as the secondary input, only GN1–GN3 will be displayed as results, as they are the only genome neighborhoods containing a neighboring gene from PF00002 (Figure 1).

References

Rémi Zallot, Nils Oberg, and John A. Gerlt, The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019 58 (41), 4169-4182.
Nils Oberg, Rémi Zallot, and John A. Gerlt, EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme Function Initiative (EFI) Web Resource for Genomic Enzymology Tools. J Mol Biol 2023.
The UniProt Consortium , UniProt: the Universal Protein Knowledgebase in 2023, Nucleic Acids Research, Volume 51, Issue D1, 6 January 2023, Pages D523–D531

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
dist		dist
images		images
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
SqlRequests.py		SqlRequests.py
UniProtRequests.py		UniProtRequests.py
custom_ui_theme.xml		custom_ui_theme.xml
find_similarity.py		find_similarity.py
loadMainUI.py		loadMainUI.py
main.py		main.py
requirements.txt		requirements.txt
ui_main_window.ui		ui_main_window.ui

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioSynthNexus

Overview

Running, Installation, Packaging

Running the Program Directly from an Executable (no dependecies needed)

Manual Installation / Packaging

Pipenv

Conda

Packaging into an application (optional)

Input / Output Options:

UniProt Requests:

Genome Neighborhood Network Requests:

GNN Information

General construction

Uploading a GNN to BioSynthNexus:

Usage:

Representative biosynthetic gene clusters (BGCs) used for the following demonstrations.

Retrieval of gene information from the UniProt database according to the selected the output type.

Retrieval of genome neighborhoods containing neighboring gene(s) within the specific Pfam ID(s).

Retrieval of accession IDs for neighboring genes within a designated Pfam ID from the given genome neighborhoods.

Retrieval of all Pfam IDs for neighboring genes from the given genome neighborhoods.

Retrieval of genome neighborhood information from the given Pfam ID(s).

References

About

Releases 2

Packages

Languages

License

Tyler-Hostetler/BioSynthNexus

Folders and files

Latest commit

History

Repository files navigation

BioSynthNexus

Overview

Running, Installation, Packaging

Running the Program Directly from an Executable (no dependecies needed)

Manual Installation / Packaging

Pipenv

Conda

Packaging into an application (optional)

Input / Output Options:

UniProt Requests:

Genome Neighborhood Network Requests:

GNN Information

General construction

Uploading a GNN to BioSynthNexus:

Usage:

Representative biosynthetic gene clusters (BGCs) used for the following demonstrations.

Retrieval of gene information from the UniProt database according to the selected the output type.

Retrieval of genome neighborhoods containing neighboring gene(s) within the specific Pfam ID(s).

Retrieval of accession IDs for neighboring genes within a designated Pfam ID from the given genome neighborhoods.

Retrieval of all Pfam IDs for neighboring genes from the given genome neighborhoods.

Retrieval of genome neighborhood information from the given Pfam ID(s).

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages