A graphical user interface to efficiently access and extract data involved with genome neighborhood networks.
General Functions:
Filters and extracts data from a Genome Neighborhood Network (GNN) sqlite file, generated by the Genome Neighborhood Tool (GNT) from Enzyme Function Initiative (EFI) Tools.1,2
Retrieves UniProt data for given protein accession ID(s).3
Pre-packaged executables are provided in the latest release.
Download the file for your operating system and extract the executable to your preferred location, and double click to start.
An environment can be created with pipenv or conda using the provided Pipfile or requirements.txt. You must have pipenv or conda/miniconda/etc installed to utilize these options, which may require additional steps.
- Open a terminal in directory containing the repository
- Run
pipenv install
- All required dependencies should then be installed
- The program can then be run with
pipenv run python main.py
- Open a terminal in directory containing the repository
- Create a conda environment
conda create -n env_name python=3.10 pip
- Activate environment
conda activate env_name
- Install Requirements
pip install -r requirements.txt
- Run the program
python main.py
If you would like to package your modified code into a single executable, pyinstaller is included in the dev-packages of the Pipfile
- Open a terminal in repository directory
- Run
pipenv install -d
- Note: If you are using a conda environment, you have to manually install pyinstaller
pip install pyinstaller
- Note: If you are using a conda environment, you have to manually install pyinstaller
- Run
pyinstaller --windowed --onefile --add-data=ui_main_window.ui:./ --add-data=custom_ui_theme.xml:./ main.py
- A Folder named
dist
will contain the packaged application
- A Folder named
Output Type | Input | Description |
---|---|---|
FASTA | Accession ID(s) | Gets the FASTA formatted protein sequence(s) |
Genome Accession ID in GenBank | Accession ID(s) | Gets the GenBank Genome Accession ID(s) |
Protein Accession ID in GenBank | Accession ID(s) | Gets the GenBank Protein Accession ID(s) |
ORF Name in Corresponding Genome | Accession ID(s) | Gets the GenBank Open Reading Frame (ORF) ID(s) |
Output Tyoe | Input | Description |
---|---|---|
Parent Accession ID | Pfam ID(s) | Gets Parent Accession IDs that correspond to neighborhoods that contain the given Pfam ID(s) |
Genome Neighborhood ID | Pfam ID(s) | Gets the Genome Neighborhood ID(s) for those that contain the given Pfam ID(s) |
Genome Neighborhood Pfams | Genome Neighborhood ID | Gets the Pfams for each of the proteins within a single BGC |
Genome Neighborhood Accessions | Genome Neighborhood ID | Gets the Accession IDs for each of the proteins within a single BGC |
Neighboring Gene Accessions by Pfam | Genome Neighborhood ID(s) + Single Pfam (Secondary Input) |
Gets the Accession IDs for the protein within each BGC with the selected Pfam Displays as 'Output Accession_(BGC ID)' |
Genome Neighborhood Pfam Comparison | Pfam ID(s) Optional: Pfam in Secondary Input |
Searches all Genome Neighborhoods for each given Pfam ID, Displays as 'Genome Neighborhood ID_(number of matching Pfams)' The Secondary Input is an optional pre-filter for neighborhoods that only contain genes from that Pfam ID |
Notes:
- The
Neighboring Gene Accessions by Pfam
outputs asGene Accession_(Genome Neighborhood ID)
, however if you useReplace Input with Output
, only the Output Accession will be displayed. - If the input field is empty, genome neighborhoods that contain unannotated proteins (Pfam = none) will be considered.
A sequence similarity network is first constructed from a query gene sequence using EFI-EST1,2 and subsequently processed by EFI-GNT1,2 to generate a genome neighborhood network (GNN). The GNN can be visualized using the genome neighborhood diagram (EFI-GND) available on EFI website. For more information on GNN construction, please vist Enzyme Function Initiative (EFI) Tools.1,2
- Click the
Upload
button to select your GNN *.sqlite file - Change the Request Type to
Genome Neighborhood Network
- Select your desired output (See Geneome Neighborhood Requests above for guidance)
- Fill the left text field, labeled 'Input', with respective input
- Click
Search
- Output will be displayed in the right text field, labeled 'Output'
- The text in the Output Box can be used as an Input by utilizing the
Replace Input with Output
Button
The GNN consists of a list of genome neighborhoods, each containing a parent gene (a homolog of the initial query) and its neighboring genes.
![Figure 1](/images/Figure S1.jpg) Examples and step-by-step instructions are detailed below. In all examples, Gene 1 was used as a query to generate a sequence similarity network (SSN) and a subsequent genome neighborhood network (GNN) via the EFI-EST and EFI-GNT websites1,2, respectively. A list of genome neighborhoods (GNs) can be visualized on EFI-GNT website. Genes from the same protein family (Pfam) are represented in the same color, while genes not belonging to PF00001–PF00005 are shown in gray for clarity.
![Figure 2](/images/Figure S2.jpg) Figure 2: In this example, the FASTA format of sequences is displayed as results, which can be used for further multiple sequence alignment.
Note: Other output types, such as Protein Accession ID in GenBank, Genome Accession ID in GenBank, and ORF nNames in the Corresponding Genome, can be selected for different purposes. This feature does not require uploading a GNN sqlite file.
![Figure 3](/images/Figure S3.jpg) Figure 3: In this example, GN1–GN3 are displayed as output results because these genome neighborhoods include a neighboring gene from PF00002 (input) (Figure S1). This strategy was employed in this study. Note: The accession IDs of Gene 1 from GN1–GN3 will be as displayed as results if “Parent Accession ID” is selected as the output type.
Retrieval of accession IDs for neighboring genes within a designated Pfam ID from the given genome neighborhoods.
![Figure 4](/images/Figure S4.jpg) Figure 4: In this example, the accession IDs of Gene 2 (secondary input) from GN1–GN3 (input) are displayed as results, with the origin of each Gene 2 specified in parentheses indicating the corresponding genome neighborhood ID. Note: After filtering the Pfam ID of interest in Figure 3, clicking “REPLACE INPUT WITH OUTPUT” button and following the instructions above enables efficient retrieval of the accession IDs of the targeted neighboring genes.
![Figure 5](/images/Figure S5.jpg) Figure 5: In this example, PF00002–PF00005 are displayed as results because these neighboring genes are within GN1 (input). Note: The accession IDs of GN1 neighboring genes will be displayed as results if “Genome Neighborhood Accession” is selected as the output type. The number of retrieved neighboring genes depends on the neighborhood size specified when generating the GNN sqlite file from the EFI-GNT website.
![Figure 6](/images/Figure S6.jpg) Figure 6: In this example, all genome neighborhoods are displayed as results, with the number of matching Pfam IDs in parentheses. Moreover, a CSV file can be retrieved to show which Pfam IDs match in each genome neighborhood. Note: The secondary input is an optional choice to pre-filter the Pfam IDs. If “PF00002” is applied as the secondary input, only GN1–GN3 will be displayed as results, as they are the only genome neighborhoods containing a neighboring gene from PF00002 (Figure 1).
- Rémi Zallot, Nils Oberg, and John A. Gerlt, The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019 58 (41), 4169-4182.
- Nils Oberg, Rémi Zallot, and John A. Gerlt, EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme Function Initiative (EFI) Web Resource for Genomic Enzymology Tools. J Mol Biol 2023.
- The UniProt Consortium , UniProt: the Universal Protein Knowledgebase in 2023, Nucleic Acids Research, Volume 51, Issue D1, 6 January 2023, Pages D523–D531