-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi,
I think I may have an idea of the right answer to my question, but hoping to check I'm not going to do something stupid. I'm wondering if I need to create an multisequence alignment file rather than using a multifasta. My ultimate aim is to produce a haplotype network.
I have a locus which I've amplified, but which has lots of small indels within an intronic part of the sequence.
I have used the following code to load a multifasta file into R, but cannot convert it to a matrix because the sequences are uneven lengths. I can't see any way to get a matrix to load with 'NA' for gaps, and I'm not sure if pegas would subsequently re-align the sequences, or assume they were aligned, if the matrix stuck a lot of NAs on the end of each sequence, rather than internally.
library("apex")
library("adegenet")
library("pegas")
library("mmod")
library("poppr")
# To get a SINGLE fasta file in:
myseq<-read.FASTA("ASV_multifasta.fa")
myseq # Provides the summary information of the file
2172 DNA sequences in binary format stored in a list.
Mean sequence length: 329.453
Shortest sequence: 294
Longest sequence: 350
Labels:
ASV3 BO_04_M
ASV3 BO_04_M
ASV3 BO_04_M
ASV3 BO_04_M
ASV3 BO_04_M
ASV3 BO_04_M
...
Base composition:
a c g t
0.282 0.204 0.190 0.324
(Total: 715.57 kb)
# We need to make it as a matrix:
myseqmatrix<-as.matrix(myseq)
Then I get the error telling me it won't work because the sequences are different lengths.
Error in as.matrix.DNAbin(myseq) :
DNA sequences in list not of the same length.
If I make a multifasta file that has a sequence from an Multisequence alignment instead of the sequence itself, would that work for pegas and a haplotype network? Or would it then change the output? Would it even work for the conversion to a matrix?
What do people do with uneven sequence lengths?
Many thanks!