Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mapping on subset of a bigger reference gives different results #381

Open
mewu3 opened this issue Jan 3, 2023 · 0 comments
Open

mapping on subset of a bigger reference gives different results #381

mewu3 opened this issue Jan 3, 2023 · 0 comments

Comments

@mewu3
Copy link

mewu3 commented Jan 3, 2023

Dear all,

I have construct a smaller reference databases (less than 1 GB) from a largger reference databse (~ 60 GB) by subtract small windows of sequences, therefore, the sequences in the samll reference are completely identifical to those in the large reference. Mapping the same query on this small reference gives more mapping than the large reference.

bwa mem -t 32 -L 50,50 -Y -h 10000

query       0       Candida_glabrata        10526496        0       40M     *       0       0       GTTCAATGCCACTCGTGACTACAAATTGCCTGAGGCCCCA        EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE        NM:i:1  MD:Z:24G15      AS:i:35 XS:i:35 XA:Z:Candida_glabrata,+10525296,40M,1;Candida_glabrata,+10526196,40M,1;Candida_glabrata,+10525896,40M,1;Candida_glabrata,+10525596,40M,1;Candida_glabrata,+10526796,40M,2;Candida_glabrata,-16703643,40M,2;Candida_glabrata,-12363822,40M,2;Candida_glabrata,+13345998,40M,3;

bwa mem -t 32 -L 50,50 -Y -h 10000

query       0       Candida_glabrata        73738   0       40M     *       0       0       GTTCAATGCCACTCGTGACTACAAATTGCCTGAGGCCCCA        EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE        NM:i:0  MD:Z:40 AS:i:40XS:i:40  XA:Z:Candida_glabrata,+73413,40M,0;Aspergillus_lentulus,-6216,40M,0;Candida_glabrata,+53407,40M,0;Candida_glabrata,+74338,40M,0;Candida_glabrata,+73113,40M,0;Candida_glabrata,+80122,40M,0;Candida_glabrata,+55738,40M,0;Candida_glabrata,+79521,40M,0;Candida_glabrata,+55438,40M,0;Candida_glabrata,+79822,40M,0;

There is Aspergillus_lentulus in the middle of Candida_glabrata

Main problem here is i got more hits on different species with the small reference, I would prefer that results remain the same as the large one. I did try to change -h to a much smaller number (100, 50, 10 ...), Some work, but others still got extra mapping species.

Could you please give me some advises ?

Thanks,
MJ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants