Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated consensus sequence #104

Open
Benni96 opened this issue Jun 30, 2017 · 3 comments
Open

Truncated consensus sequence #104

Benni96 opened this issue Jun 30, 2017 · 3 comments

Comments

@Benni96
Copy link

Benni96 commented Jun 30, 2017

Hi,
I collapsed amplicon data and got some truncated reads during the collapse step.

The data was paired-ed reads which were stiched to single reads. The stiched reads were collapsed with the UMI being inline.

bmftools collapse inline -S -l 10 -s <homing> -f <prefix> -z <stiched reads>

After mapping I observed some reads which did not span the entire amplicon region. I checked back the read in the UMI file and in the stiched reads file. The "original" stiched read file contained 12900 reads with the UMI and 99.9% were full length and only 10 were smaller. However, the smallest read was still longer than the read in the UMI read file.

UMI: GCATCCACAAAT
Stiched reads with this UMI: 12963 reads
length distribution (count / length):
1 96
1 129
1 130
10 131
161 132
12787 133
2 134
length of the consensus of the UMI family: 69 bp
The homing sequence is 3 nt and the barcode 10nt. Therefore, even if the 96nt should result in a consensus read.

Do you have any suggestions? Or was this observed before?

@dnbaker
Copy link
Contributor

dnbaker commented Jul 1, 2017

Were your input reads all of uniform read length? I'm surprised by this behavior; would you be willing to provide some data with which I can reproduce the issue?

Thank you!

@Benni96
Copy link
Author

Benni96 commented Jul 3, 2017

Hi,
The read length varies a bit (+/- 2 bp) as you also see in the post before. I also observed this phenomenom in other datasets at low level. Accidentely, I tried the UMI generation with the option "-n 5" and then the consensus reads were correct. However, I have no clue why this option does change the output. Do you?

@dnbaker
Copy link
Contributor

dnbaker commented Jul 5, 2017

BMFtools assumes uniform read length, which is why adapter masking, not trimming, is suggested.

Are you using Illumina data?

-n only changes memory requirements.

How many reads passed homing sequence?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants