This tool implements a parallelized version of the Smith-Waterman algorithm that is used to find optimal local alignaments between two sequences. By leveraging multiple processes, it aims to improve the performance of the alignment process on multicore systems.
- Nucleotide of Amino Acid Sequences
- Customizable Scoring
- Parallel Processing: It takes advantage of multi-core CPUs by using a user-defined number of processes.
- Python 3.x
- NumPy
- Multiprocessing
No installation is necessary for running this script if you have Python and the required packages. To install the necessary Python packages, you can use pip:
pip install numpy multiprocess
To run the tool, simply execute the script from your command line:
python Smith_waterman_parallel.py
Follow the on-screen prompts to input your sequences and parameters:
- First sequence: The first DNA or protein sequence for alignment.
- Second sequence: The second DNA or protein sequence for alignment.
- Match score: The score for matching characters (positive integer).
- Mismatch penalty: The penalty for mismatching characters (negative integer).
- Gap penalty: The penalty for gaps in alignment (negative integer).
- Number of processes: The number of parallel processes to use (1 to maximum number of CPUs).
Input:
Enter the first sequence: AGTACGCA
Enter the second sequence: TATGC
Enter the match score (positive integer): 2
Enter the mismatch penalty (negative integer): -1
Enter the gap penalty (negative integer): -1
Enter the number of processes (1-8): 4
Output:
Alignment 1: AGTACGCA
Alignment 2: -ATG--C-
Score: 5
Time taken: 0.032 seconds
To parallelize the Smith-Waterman algorithm, the tool divides the computation of the scring matrix cells across multiple processes. Here's a brief description of the parallelization approach:
-
Matrix initialization:
- Initialize a scoring matrix 'H' with dimensions '(len(SeqA) + 1) x (len(SeqB) + 1)', filled with zeros.
-
Anti-diagonal processing:
-
The computation proceeds along the anti-diagonals of the scoring matrix. Cells on the same anti-diagonal are independent of each other and can be computed in parallel.
-
For each anti.diagonal, create a lost of tasks where each task computes the score a specific cell '(i, j)' in the scoing matrix.
-
-
Parallel Computaion:
-
Use Python's multiprocessing.Pool to distribute the tasks across the available processes.
-
Each process computes the scores for its assigned cells independently.
-
-
Traceback:
- After filling the scoring matrix, the traceback procedure is performed to determine the optimal local alignment.
-
pairwise_score(n, m, match, mismatch): Computes the score for aligning two characters.
-
compute_cells(args): Computes the score for a specific cell in the scoring matrix.
-
traceback(H, SeqA, SeqB, match, mismatch, gap): Traces back through the scoring matrix to find the optimal local alignment.
-
smith_waterman_parallel(SeqA, SeqB, match, mismatch, gap, num_processes): Main function that performs the Smith-Waterman alignment using parallel processing.
Contributions to the project are welcome. You can contribute in several ways:
- Reporting issues
- Adding new features
- Improving documentation
- Refactoring code for better efficiency or readability