mapping on subset of a bigger reference gives different results #381

mewu3 · 2023-01-03T07:12:44Z

Dear all,

I have construct a smaller reference databases (less than 1 GB) from a largger reference databse (~ 60 GB) by subtract small windows of sequences, therefore, the sequences in the samll reference are completely identifical to those in the large reference. Mapping the same query on this small reference gives more mapping than the large reference.

bwa mem -t 32 -L 50,50 -Y -h 10000

query       0       Candida_glabrata        10526496        0       40M     *       0       0       GTTCAATGCCACTCGTGACTACAAATTGCCTGAGGCCCCA        EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE        NM:i:1  MD:Z:24G15      AS:i:35 XS:i:35 XA:Z:Candida_glabrata,+10525296,40M,1;Candida_glabrata,+10526196,40M,1;Candida_glabrata,+10525896,40M,1;Candida_glabrata,+10525596,40M,1;Candida_glabrata,+10526796,40M,2;Candida_glabrata,-16703643,40M,2;Candida_glabrata,-12363822,40M,2;Candida_glabrata,+13345998,40M,3;

bwa mem -t 32 -L 50,50 -Y -h 10000

query       0       Candida_glabrata        73738   0       40M     *       0       0       GTTCAATGCCACTCGTGACTACAAATTGCCTGAGGCCCCA        EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE        NM:i:0  MD:Z:40 AS:i:40XS:i:40  XA:Z:Candida_glabrata,+73413,40M,0;Aspergillus_lentulus,-6216,40M,0;Candida_glabrata,+53407,40M,0;Candida_glabrata,+74338,40M,0;Candida_glabrata,+73113,40M,0;Candida_glabrata,+80122,40M,0;Candida_glabrata,+55738,40M,0;Candida_glabrata,+79521,40M,0;Candida_glabrata,+55438,40M,0;Candida_glabrata,+79822,40M,0;

There is Aspergillus_lentulus in the middle of Candida_glabrata

Main problem here is i got more hits on different species with the small reference, I would prefer that results remain the same as the large one. I did try to change -h to a much smaller number (100, 50, 10 ...), Some work, but others still got extra mapping species.

Could you please give me some advises ?

Thanks,
MJ

The text was updated successfully, but these errors were encountered:

lh3 added the Data needed label Mar 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mapping on subset of a bigger reference gives different results #381

mapping on subset of a bigger reference gives different results #381

mewu3 commented Jan 3, 2023 •

edited

Loading

mapping on subset of a bigger reference gives different results #381

mapping on subset of a bigger reference gives different results #381

Comments

mewu3 commented Jan 3, 2023 • edited Loading

mewu3 commented Jan 3, 2023 •

edited

Loading