-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating select_upper and selectz_upper in EliasFano.hpp #19
Comments
Er... or maybe the +1 is unnecessary. I have to think about this—there's clearly an off-by-one in one of the two locations. |
I have an example of incorrect behavior in the current code, where I
believe that +1 is needed in both locations.
…On Tue, Nov 29, 2022 at 6:16 AM Sebastiano Vigna ***@***.***> wrote:
Er... or maybe the +1 is unnecessary. I have to think about this—there's
clearly an off-by-one in one of the two locations.
—
Reply to this email directly, view it on GitHub
<#19 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUA6S5TTNNRZYAZVHZCDC3WKYF2VANCNFSM6AAAAAASOQKAYY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Also, line 185 has a debugging statement that has the +1, but you
understand the code much better than I do.
…On Tue, Nov 29, 2022 at 6:20 AM Thomas Wu ***@***.***> wrote:
I have an example of incorrect behavior in the current code, where I
believe that +1 is needed in both locations.
On Tue, Nov 29, 2022 at 6:16 AM Sebastiano Vigna ***@***.***>
wrote:
> Er... or maybe the +1 is unnecessary. I have to think about this—there's
> clearly an off-by-one in one of the two locations.
>
> —
> Reply to this email directly, view it on GitHub
> <#19 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABUA6S5TTNNRZYAZVHZCDC3WKYF2VANCNFSM6AAAAAASOQKAYY>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Can you show me the example? |
Here is an example. I am attaching a file containing the position of 851
ones, ranging from 0 to 4456275, where nbits is 4458158. These positions
are intended to be stored in an Elias-Fano data structure, and L is
computed to be 12. Situation A (current code): If we provide (num_ones +
(nbits >> L)) (which equals 1939) in creating SelectSimpleZeroHalf, we get
the following inventory:
Inventory 0 2
Inventory 1 125539110932840448
Inventory 2 268530804322075212
Inventory 3 370144812833702929
Inventory 4 494558681006278018
Inventory 5 1845
Inventory 6 0
Inventory 7 0
Inventory 8 0
Inventory 9 0
Inventory 10 1939
Situation B: if we provide (num_ones + (nbits >> L) + 1) (which
equals 1940) in creating SelectSimpleZeroHalf, we get the following
inventory instead:
Inventory 0 2
Inventory 1 125539110932840448
Inventory 2 268530804322075212
Inventory 3 370144812833702929
Inventory 4 494558681006278018
Inventory 5 1845
Inventory 6 6160384
Inventory 7 0
Inventory 8 0
Inventory 9 0
Inventory 10 1940
Now, suppose we call rank on the Elias-Fano data structure for k = 4456468.
which is beyond the last one (4456275). This makes a call to
selectz_upper.selectZero(1088), because 4456468 >> 12 is 1088. This yields
inventory index 1, inventory rank 1845, subrank 64. Under situation A, we
get a start of 1845 and a residual of 0 to return a pos of 1845. This
gives a rank of (pos - k_shiftr_L) = 1845 - 1088 = 757, which appears to be
incorrect. Under situation B, we get a start of 1939 and a residual of 0
to return a pos of 1939. This gives a rank of (pos - k_shiftr_L) = 1939 -
1088 = 851, which corresponds to the last one in the dataset, as we would
expect.
I don't follow all of the code for the inventory, but I hope this
information helps in your thinking about the issue. Thank you very much
for providing the code, which I am using recently in a bioinformatics
program called GSNAP for aligning reads to a genomic sequence.
Tom
On Tue, Nov 29, 2022 at 6:30 AM Sebastiano Vigna ***@***.***> wrote:
Can you show me the example?
—
Reply to this email directly, view it on GitHub
<#19 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUA6S2J4HBL62J2PHXDDULWKYHO5ANCNFSM6AAAAAASOQKAYY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
0
3300
6802
10122
13043
16545
19843
23186
26688
29773
32869
36168
39288
42408
45528
48719
52221
55539
58687
61986
65285
68576
71875
75174
78673
81993
85313
88256
91199
94142
97085
100025
102968
105911
108854
111797
114740
117683
120626
123569
126512
129455
132398
135341
138284
141227
144170
147488
150450
153391
156333
159271
162212
165155
168097
171600
174672
177615
180687
183631
186930
190433
193724
197227
200314
203635
206732
210053
213174
216514
219635
222935
226438
229757
233076
236225
239525
242825
246327
249830
253333
256841
260141
263643
267143
270646
273590
276534
279478
282422
285366
288310
291254
294198
297142
300086
303030
305974
308918
311862
314806
317750
320694
323638
326582
329526
332475
335433
338376
341320
344264
347765
351268
354406
357350
360294
363238
366182
369685
372629
375576
378649
381722
385021
388524
391898
395979
399302
403288
406611
409651
412691
416014
419337
422297
425620
428943
432266
436117
439971
443822
447673
451524
455376
458088
460799
463511
466223
468935
471647
474359
477071
479783
482495
485207
487919
490631
493343
496055
498767
501479
504191
506903
509615
512593
515305
518017
520729
523441
526153
528865
531577
534289
537001
539713
542425
545137
547849
550561
553270
555982
558695
561418
564135
566847
569559
572271
574983
577695
580407
583120
586103
589426
592137
594849
597827
601140
604463
608543
612623
615945
618984
622023
625345
628667
631989
635311
638270
641592
644914
648236
651558
655408
659261
662583
665294
668005
670911
673622
676333
679044
681755
684466
687788
690499
693210
695921
698632
701343
704052
706764
709470
712185
714896
718778
721488
724800
728112
731434
734756
737454
741534
744245
748563
751637
755955
759308
762662
766016
769934
773288
776642
779996
782971
785946
788921
792341
795386
799494
802849
807167
811350
815533
818887
823070
826424
829778
833132
836486
839518
843836
847190
850544
854862
859178
862532
865886
869241
872595
875949
879303
882657
886971
890325
894243
898161
901515
904869
908223
911577
914931
918867
922785
927103
931421
935739
940022
942997
945972
948946
951921
954896
957871
960846
963821
966796
969771
973705
976680
979655
982630
985605
988580
991555
994530
997505
1000480
1003450
1006425
1009470
1012445
1015420
1018395
1021363
1024338
1027313
1030288
1033263
1036238
1039213
1042188
1045163
1048138
1051113
1054088
1057063
1060029
1063004
1065979
1068954
1071928
1074904
1077869
1080843
1083818
1086793
1089768
1092744
1095719
1098697
1102050
1105478
1108833
1112188
1115163
1119481
1122835
1126189
1130316
1133219
1136573
1139927
1143281
1147559
1150591
1153622
1156649
1159680
1163054
1166407
1170725
1173757
1177111
1180465
1183503
1187420
1190458
1193485
1196839
1201153
1205070
1208424
1212551
1216468
1219802
1222777
1225752
1228727
1231702
1234677
1237652
1240627
1243602
1246577
1249552
1252527
1255502
1258477
1261452
1264427
1267402
1270377
1273352
1276327
1279302
1282277
1285252
1288607
1291582
1294557
1297532
1300507
1303482
1306457
1309432
1312407
1315382
1318357
1321332
1324307
1327281
1330249
1333229
1336205
1339179
1342155
1345130
1348485
1351554
1354909
1358264
1361239
1364214
1367189
1370544
1373899
1377254
1380834
1384188
1387163
1391290
1395417
1399544
1402519
1405494
1408849
1412204
1415558
1419686
1424004
1426979
1431990
1437003
1442013
1447026
1453766
1460506
1467246
1473985
1480725
1487465
1494205
1500945
1504598
1508251
1511904
1516690
1521476
1526262
1531048
1540823
1550598
1560373
1570148
1579905
1589384
1598861
1608339
1613604
1619339
1628816
1634550
1640284
1646018
1655775
1661481
1671238
1681013
1686747
1692481
1698215
1707990
1713724
1718925
1724564
1730204
1735403
1745178
1754953
1764728
1774503
1784278
1793916
1803691
1813466
1823241
1833016
1838655
1844302
1851089
1862615
1874133
1885643
1897165
1908699
1920229
1931755
1943281
1954807
1966333
1977855
1989376
2000886
2012396
2023917
2035439
2046969
2057998
2069027
2080024
2091052
2102081
2113110
2124139
2135168
2146692
2158214
2169740
2181266
2192792
2203874
2214947
2226029
2237558
2249092
2260622
2272148
2283674
2295196
2306728
2318264
2329790
2341316
2352838
2364368
2375894
2387404
2398930
2410452
2421988
2433514
2445040
2456566
2468093
2479623
2491084
2502551
2514012
2525473
2536934
2548401
2559862
2571329
2582796
2594262
2605484
2616951
2628412
2639873
2657707
2675537
2693059
2699543
2706028
2712513
2719005
2725488
2731155
2736824
2743316
2748871
2754398
2759924
2765459
2770997
2777028
2783557
2790086
2795796
2801373
2807227
2813081
2818933
2824126
2829319
2836421
2843523
2850626
2857728
2864180
2871164
2877775
2884877
2891979
2899459
2906401
2913761
2921241
2926909
2932611
2938315
2944019
2949687
2955390
2961093
2966761
2972464
2978168
2983871
2989574
2995277
3000980
3006685
3017765
3029336
3040392
3051448
3062922
3073776
3084623
3095399
3109307
3123176
3136995
3149900
3162805
3175711
3188613
3201414
3214859
3227733
3240402
3253001
3256823
3260608
3263982
3267608
3271126
3274752
3278269
3281895
3285521
3289147
3292773
3296595
3300380
3304118
3307744
3311097
3314763
3318429
3322055
3325680
3329036
3332554
3336062
3339688
3343314
3346940
3350566
3354192
3357818
3361444
3365071
3368697
3372323
3375948
3379574
3383200
3386826
3390452
3394078
3397704
3400985
3404807
3408629
3411953
3415610
3419236
3422862
3426544
3430052
3433678
3437304
3440902
3444528
3448154
3451780
3455406
3459032
3462658
3465974
3469524
3473074
3476617
3480160
3483703
3487246
3490789
3494339
3497883
3501426
3504970
3508514
3511704
3514891
3518081
3521271
3524819
3528367
3531919
3534969
3538019
3541360
3544908
3548451
3551999
3555545
3558683
3561821
3564875
3567627
3570765
3573903
3577041
3580179
3583317
3586455
3589593
3592702
3595733
3598871
3601926
3604980
3608034
3610863
3613727
3616781
3620277
3623766
3626908
3630418
3638379
3646340
3654302
3657846
3661390
3664932
3668476
3672020
3675562
3679083
3682596
3686137
3689689
3693246
3697016
3700786
3704556
3708254
3719800
3731346
3744073
3756801
3768347
3779894
3792201
3804929
3817232
3829538
3841085
3852632
3864179
3875726
3887473
3899243
3911007
3922767
3934519
3946275
3958041
3969805
3982121
3993854
4005625
4017943
4029708
4041465
4053244
4064999
4076761
4088527
4100290
4112045
4123810
4135562
4147333
4159093
4170840
4182606
4194352
4206109
4217874
4229628
4241384
4253151
4264849
4276617
4288389
4300156
4311928
4323680
4335435
4347196
4358949
4370713
4371587
4372307
4373181
4382451
4391721
4400991
4410261
4419531
4430138
4440747
4451356
4451902
4452489
4454392
4456275
|
Yes, you're right. The point is that if the upper bound u is not a multiple of 2𝓁, then there might be elements whose value of the upper bits is u >> 𝓁, which means you need that position to be part of the upper bits. If you can give me a real name I'll thank you in the CHANGES section. |
Thanks for confirming the change to your code. My name is Thomas Wu, with
my signature shown below. Thanks, Tom
Thomas D. Wu, M.D., Ph.D.
Distinguished Scientist, Bioinformatics & Computational Biology
Genentech <https://www.gene.com/>, A Member of the Roche Group
***@***.***
Join Genentech on LinkedIn <https://www.linkedin.com/company/genentech> |
Twitter
<https://twitter.com/genentech?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor>
| Facebook <https://www.facebook.com/Genentech/> | Instagram
<https://www.instagram.com/genentech/?hl=en> | YouTube
<https://www.youtube.com/genentech>
…On Sun, Jun 11, 2023 at 7:43 AM Sebastiano Vigna ***@***.***> wrote:
Yes, you're right. The point is that if the upper bound u is not a
multiple of 2𝓁, then there might be elements whose value of the upper
bits is u >> 𝓁, which means you need that position to be part of the upper
bits.
If you can give me a real name I'll thank you in the CHANGES section.
—
Reply to this email directly, view it on GitHub
<#19 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUA6S6UFNXXJKHBPCHOFKTXKXKQHANCNFSM6AAAAAASOQKAYY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
In EliasFano.hpp, line 119, upper_bits.size has the term (num_ones + (num_bits >> l) + 1), which is converted to the number of words by adding 63 and dividing by 64. To be consistent, shouldn't lines 137 and 138 call SimpleSelectHalf and SimpleSelectZeroHalf by also adding 1 to (num_ones + (num_bits >> l)? That way, in SimpleSelectHalf, line 83, and SimpleSelectZeroHalf, line 83, the value of num_words will be equal to upper_bits.size.
The text was updated successfully, but these errors were encountered: