GitHub - rkalendar/GeneDistance: Identifications of genetic similarity or the distance between genomic sequences

GeneDistance

Identifications of genetic similarity or the distance between genomic sequences (approximate matching algorithm)

Alignment-free method for calculating genetic distances between DNA sequences as a basis for similarity distance estimation and phylogenetic reconstruction. The proposed approximate matching algorithm is effective for measuring homology in cases of mixed sequences and different lengths, including individual chromosomes of the same or different species can be used for analysis. The application of the proposed approximate matching algorithm is not limited to determining the degree of homology between sequences, but can be used for phylogenetic analysis, species identification and sequence classification in genomic assemblies.

Author

Ruslan Kalendar email: [email protected]

Availability and requirements:

Operating system(s): Platform independent

Programming language: Java 25 or higher

Java Downloads

How do I set or change the Java path system variable

Installing Java using Conda

To install a specific version of OpenJDK using Conda, you need to specify the version number in your installation command and use the conda-forge channel. The latest version is available on the conda-forge channel.

Add the conda-forge channel (if not already added). It is recommended to add the conda-forge channel to your configuration and set its priority to strict to ensure packages are preferentially installed from this channel:

conda config --add channels conda-forge

conda config --set channel_priority strict

Create a new Conda environment and install the desired OpenJDK version. Creating a dedicated environment helps manage dependencies and avoid conflicts with other projects:

conda create -n java25 openjdk=25

Activate the new environment:

conda activate java25

Check if you have Java installed. The output should display information for the installed Java version:

java -version

The program generates a file for analysis in the software MEGA 12: https://www.megasoftware.net/

To run the project from the command line (CLI). Command-line options, separated by spaces. The executive file GeneDistance.jar is in the dist directory, which can be copied to any location. Go to the target folder and type the following; an individual file or a file folder can be specified:

java -jar GeneDistance.jar <target_file_path/Folder_path>

Basic usage:

java -jar <GeneDistancePath>\dist\GeneDistance.jar <target_file_path> optional_commands

Examples:

java -jar C:\GeneDistance\dist\GeneDistance.jar C:\GeneDistance\test\t1.txt

java -jar C:\GeneDistance\dist\GeneDistance.jar E:\Genomes\Chloroplast\ -kmer=6

Large genome usage (you will have to show the program to use more RAM, for example as listed here, up to 64 Gb memory: -Xms16g -Xmx64g):

java -jar -Xms16g -Xmx64g C:\GeneDistance\dist\GeneDistance.jar E:\Genomes\T2T-CHM13v2.0\ -kmer=6

For chromosomes larger than 500 Mb you will need to use more memory, 128 Gb:

java -jar -Xms32g -Xmx128g C:\GeneDistance\dist\GeneDistance.jar E:\Genomes\Cycas_panzhihuaensis\ -kmer=8

Sequence Entry:

Sequence data files are prepared using a text editor and saved in ASCII as text/plain format (.txt) or in .fasta or without file extensions (a file extension is not obligatory). The program takes a single sequence or accepts multiple DNA sequences in FASTA format. The template length is not limited.

FASTA format description:

A sequence in FASTA format consists of the following: One line starts with a ">" sign and a sequence identification code. A textual description of the sequence optionally follows it. Since it is not part of the official format description, software can ignore it when it is present. One or more lines containing the sequence itself. A file in FASTA format may comprise more than one sequence.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
dist		dist
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GeneDistance

Identifications of genetic similarity or the distance between genomic sequences (approximate matching algorithm)

Author

Availability and requirements:

Installing Java using Conda

Basic usage:

Examples:

Large genome usage (you will have to show the program to use more RAM, for example as listed here, up to 64 Gb memory: -Xms16g -Xmx64g):

Sequence Entry:

FASTA format description:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

rkalendar/GeneDistance

Folders and files

Latest commit

History

Repository files navigation

GeneDistance

Identifications of genetic similarity or the distance between genomic sequences (approximate matching algorithm)

Author

Availability and requirements:

Installing Java using Conda

Basic usage:

Examples:

Large genome usage (you will have to show the program to use more RAM, for example as listed here, up to 64 Gb memory: -Xms16g -Xmx64g):

Sequence Entry:

FASTA format description:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages