forked from kcajj/recombinant_population_analysis
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
21 changed files
with
4,514 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,5 +2,4 @@ data/ | |
__pycache__/ | ||
results/ | ||
log/ | ||
.snakemake/ | ||
thesis/ | ||
.snakemake/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,9 @@ | ||
# recombinant_population_analysis | ||
|
||
This repository represents the largest portion of my [Bachelor thesis](/Thesis_Giacomo_Castagnetti.pdf) in Genomics at University of Bologna. This project started in summer 2023 during the Biozentrum research summer project, at [NeherLab](https://neherlab.org/). I graduated in July 2024, with a final score of 110/110 cum laude, along with a honourable mention. | ||
This repository represents the largest portion of my [Bachelor thesis](/thesis/Thesis_Giacomo_Castagnetti.pdf) in Genomics at University of Bologna. This project started in summer 2023 during the Biozentrum research summer project, at [NeherLab](https://neherlab.org/). I graduated in July 2024, with a final score of 110/110 cum laude, along with a honourable mention. | ||
|
||
This repository processes the data produced by an experiment performed with the Aionostat, a machine that allows automatic experimental evolution of phages. | ||
|
||
The experiment consists in evolving 3 phages with a bacterium that they do not infect well initially. Through an initial [exploratory analysis](https://github.com/kcajj/rec_genome_analysis) that I performed, we know that two phages recombined and one got extinct. The objective of this repository is to create a more precise and high througput procedure to measure recombination in population sequencing data. | ||
The experiment consists in evolving 3 phages with a bacterium that they do not infect well initially. Through an initial [exploratory analysis](https://github.com/kcajj/rec_genome_analysis) that I performed, we know that two phages recombined and one got extinct. The objective of this repository is to create a more precise and high througput procedure to measure recombination in population sequencing data. | ||
|
||
We produced a pipeline that carries out the whole analysis, yielding the plots of the coverage of the genomes of the two phages for each timestep of the experiment. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import numpy as np | ||
import matplotlib.pyplot as plt | ||
import matplotlib | ||
import pysam | ||
|
||
def get_hyb_ref_map(bam_file): | ||
map_hyb_ref={} | ||
with pysam.AlignmentFile(bam_file, "rb") as bam: | ||
for read in bam.fetch(): | ||
if not(read.is_secondary): | ||
alignment=read.get_aligned_pairs() | ||
for hyb,ref in alignment: | ||
if hyb==None: | ||
continue | ||
if hyb in map_hyb_ref.keys(): | ||
if map_hyb_ref[hyb]==None: | ||
map_hyb_ref[hyb]=ref | ||
else: | ||
continue | ||
else: | ||
map_hyb_ref[hyb]=ref | ||
return map_hyb_ref | ||
|
||
if __name__ == "__main__": | ||
populations=["P2","P3"] | ||
timesteps=["1","3","5","7"] | ||
|
||
recombination_folder="results/genomewide_recombination" | ||
coverage_folder="results/coverage_arrays" | ||
#refs_msa_path= | ||
#output_path="thesis/plots" | ||
|
||
threshold_frequency=0.01 | ||
|
||
bas51_map=get_hyb_ref_map("thesis/alignments/hybrid_on_bas51.bam") | ||
bas54_map=get_hyb_ref_map("thesis/alignments/hybrid_on_bas54.bam") | ||
|
||
print(" ") | ||
print("bas51") | ||
print(bas51_map[98100]) | ||
|
||
print(" ") | ||
print("hybrid ref") | ||
print(list(bas51_map.keys())[list(bas51_map.values()).index(95262)]) | ||
print(list(bas51_map.keys())[list(bas51_map.values()).index(123100)]) | ||
print(list(bas51_map.keys())[list(bas51_map.values()).index(125434)]) | ||
|
||
|
||
|
||
print(" ") | ||
print("bas54") | ||
print(bas54_map[96213]) | ||
print(bas54_map[124146]) | ||
print(bas54_map[126481]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
from Bio import SeqIO,AlignIO | ||
import numpy as np | ||
import matplotlib.pyplot as plt | ||
|
||
def read_msa(path): | ||
alignment = AlignIO.read(open(path), "fasta") | ||
print(alignment) | ||
l=alignment.get_alignment_length() | ||
msa_matrix=np.zeros([3,l],dtype=str) | ||
for i,record in enumerate(alignment): | ||
for pos,nuc in enumerate(record.seq): | ||
msa_matrix[i][pos]=nuc | ||
return msa_matrix | ||
|
||
def get_evidences_distributions(msa_matrix,i_ref1=0,i_ref2=1,i_extra=2): | ||
l=len(msa_matrix[0]) | ||
e_distribution = np.zeros(l, dtype=int) | ||
|
||
for pos,array in enumerate(msa_matrix[0]): | ||
nuc_extra=msa_matrix[i_extra,pos] | ||
nuc_first_ref=msa_matrix[i_ref1,pos] | ||
nuc_second_ref=msa_matrix[i_ref2,pos] | ||
#if nuc_first_ref!='-' and nuc_second_ref!='-': | ||
if nuc_extra!='-' and nuc_first_ref!='-' and nuc_second_ref!='-': | ||
if nuc_extra==nuc_first_ref and nuc_extra==nuc_second_ref: | ||
continue | ||
elif nuc_extra!=nuc_first_ref and nuc_extra!=nuc_second_ref: | ||
e_distribution[pos]=1 | ||
elif nuc_extra==nuc_first_ref and nuc_extra!=nuc_second_ref: | ||
e_distribution[pos]=2 | ||
elif nuc_extra!=nuc_first_ref and nuc_extra==nuc_second_ref: | ||
e_distribution[pos]=3 | ||
#elif nuc_first_ref=="-" and nuc_second_ref=="-": | ||
# if nuc_extra==nuc_first_ref: | ||
# continue | ||
# elif nuc_extra!=nuc_first_ref: | ||
# e_distribution[pos]=1 | ||
|
||
return e_distribution | ||
|
||
def map_refcoord_msacoord(ref_path,refs_msa_path,i_ref_in_msa): | ||
map={} | ||
i=0 | ||
j=0 | ||
ref_seq=SeqIO.read(ref_path, "fasta").seq | ||
alignment=AlignIO.read(open(refs_msa_path), "fasta") | ||
while i<len(ref_seq): | ||
if alignment[i_ref_in_msa,j]!='-': | ||
map[i]=j | ||
i+=1 | ||
j+=1 | ||
return map | ||
|
||
def add_to_msa(msa_path,seq,mapping_start,mapping_end): | ||
alignment = AlignIO.read(open(msa_path), "fasta") | ||
l=alignment.get_alignment_length() | ||
|
||
msa_matrix=np.zeros([3,l],dtype=str) | ||
for i,record in enumerate(alignment): | ||
for pos,nuc in enumerate(record.seq): | ||
msa_matrix[i][pos]=nuc | ||
|
||
cut_msa_matrix=msa_matrix[:,mapping_start:mapping_end] | ||
for pos in range(len(cut_msa_matrix[2])): | ||
cut_msa_matrix[2][pos]=seq[pos] | ||
|
||
return cut_msa_matrix | ||
|
||
def length_msa(msa_path): | ||
alignment = AlignIO.read(open(msa_path), "fasta") | ||
return alignment.get_alignment_length() | ||
|
||
def extract_references_names(msa_path): | ||
alignment = AlignIO.read(open(msa_path), "fasta") | ||
names=[] | ||
for record in alignment: | ||
names.append(record.id) | ||
return names |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
t=0.1 | ||
P2 1 | ||
Values: [] | ||
Indices: [] | ||
P2 3 | ||
Values: [0.43 0.3683] | ||
Indices: [ 95262 123100] | ||
P2 5 | ||
Values: [0.5754 0.5578 0.1535] | ||
Indices: [ 95262 123100 95418] | ||
P2 7 | ||
Values: [0.7862 0.7257 0.5069] | ||
Indices: [ 95262 123100 78670] | ||
|
||
t=0.05 | ||
P2 1 | ||
Values: [0.1935 0.0928] | ||
Indices: [ 95262 123100] | ||
P2 3 | ||
Values: [0.43 0.3683 0.108 0.076 0.0555] | ||
Indices: [ 95262 123100 95418 125434 96040] | ||
P2 5 | ||
Values: [0.5754 0.5578 0.1535 0.1499 0.0741] | ||
Indices: [ 95262 123100 95418 78670 125434] | ||
P2 7 | ||
Values: [0.7862 0.7257 0.5069 0.1652 0.0635 0.0581 0.0577 0.0565] | ||
Indices: [ 95262 123100 78670 79334 79623 78669 123096 95418] | ||
|
||
t=0.01 | ||
P2 1 | ||
Values: [0.1935 0.0928 0.0531 0.036 0.028 0.026 0.0255 0.0251 0.02 0.0193 | ||
0.0188 0.0187 0.0164 0.0139] | ||
Indices: [ 95262 123100 67722 96040 77864 94729 79179 96865 125434 76956 | ||
69844 95152 69631 79178] | ||
P2 3 | ||
Values: [0.43 0.3683 0.108 0.076 0.0555 0.0511 0.0368 0.0308 0.0296 0.0295 | ||
0.0288 0.0269 0.0257 0.0224 0.0207 0.02 0.0178 0.0174 0.0158 0.0147 | ||
0.0144 0.014 0.0139 0.0139 0.0138 0.0134 0.0133 0.013 0.0129 0.0128 | ||
0.0125 0.012 0.0118 0.0118 0.0113 0.0111 0.011 0.0109 0.0107 0.0106 | ||
0.0104] | ||
Indices: [ 95262 123100 95418 125434 96040 96865 77864 96726 123096 95718 | ||
95004 76956 97155 95893 95675 96230 96866 95462 97076 97156 | ||
96565 96655 96566 95585 96727 95676 96953 94554 95745 94727 | ||
96808 97404 96116 95294 123241 95419 97615 95152 95527 96307 | ||
95086] | ||
P2 5 | ||
Values: [0.5754 0.5578 0.1535 0.1499 0.0741 0.0571 0.0489 0.0472 0.0461 0.0442 | ||
0.0406 0.0324 0.0316 0.0274 0.0273 0.0268 0.0267 0.025 0.0235 0.0219 | ||
0.0219 0.0218 0.0209 0.0199 0.0197 0.0184 0.0182 0.0181 0.0181 0.0181 | ||
0.0173 0.0171 0.0163 0.0162 0.0147 0.0144 0.014 0.0132 0.0132 0.0128 | ||
0.0127 0.0123 0.0113 0.0113 0.0109 0.0106 0.0106 0.0102] | ||
Indices: [ 95262 123100 95418 78670 125434 96865 79334 96040 96726 123096 | ||
97155 95675 95893 77864 95462 97156 96565 95086 96727 95585 | ||
79623 95718 95294 79284 78669 96953 95419 76956 97076 95745 | ||
95676 97495 123241 96808 96566 97295 96866 96229 96230 96361 | ||
95817 95527 79774 96655 95774 78961 99036 78762] | ||
P2 7 | ||
Values: [0.7862 0.7257 0.5069 0.1652 0.0635 0.0581 0.0577 0.0565 0.0565 0.0341 | ||
0.0317 0.0303 0.0291 0.0271 0.0249 0.0245 0.024 0.0238 0.0233 0.0204 | ||
0.0195 0.0159 0.0156 0.0152 0.0145 0.0136 0.013 0.0127 0.0119 0.0118 | ||
0.0116 0.0112 0.0111 0.0111 0.0108] | ||
Indices: [ 95262 123100 78670 79334 79623 78669 123096 95418 79284 78961 | ||
79774 78762 96726 125434 96040 97155 79218 95893 123241 96865 | ||
95675 96565 79861 79611 96727 95745 77864 123242 97076 96655 | ||
78664 95462 80064 96229 95585] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
P3 1 | ||
Values: [] | ||
Indices: [] | ||
P3 3 | ||
Values: [] | ||
Indices: [] | ||
P3 5 | ||
Values: [] | ||
Indices: [] | ||
P3 7 | ||
Values: [] | ||
Indices: [] | ||
P3 1 | ||
Values: [] | ||
Indices: [] | ||
P3 3 | ||
Values: [] | ||
Indices: [] | ||
P3 5 | ||
Values: [] | ||
Indices: [] | ||
P3 7 | ||
Values: [] | ||
Indices: [] | ||
P3 1 | ||
Values: [] | ||
Indices: [] | ||
P3 3 | ||
Values: [] | ||
Indices: [] | ||
P3 5 | ||
Values: [0.0175 0.0156 0.0117] | ||
Indices: [95262 78670 93362] | ||
P3 7 | ||
Values: [0.0247 0.0212] | ||
Indices: [124695 124732] |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.