Commit 36d383d
PyArrow: Avoid buffer-overflow by avoid doing a sort (#1555)
Second attempt of #1539
This was already being discussed back here:
#208 (comment)
This PR changes from doing a sort, and then a single pass over the table
to the approach where we determine the unique partition tuples filter on
them individually.
Fixes #1491
Because the sort caused buffers to be joined where it would overflow in
Arrow. I think this is an issue on the Arrow side, and it should
automatically break up into smaller buffers. The `combine_chunks` method
does this correctly.
Now:
```
0.42877754200890195
Run 1 took: 0.2507691659993725
Run 2 took: 0.24833179199777078
Run 3 took: 0.24401691700040828
Run 4 took: 0.2419595829996979
Average runtime of 0.28 seconds
```
Before:
```
Run 0 took: 1.0768639159941813
Run 1 took: 0.8784021250030492
Run 2 took: 0.8486490420036716
Run 3 took: 0.8614017910003895
Run 4 took: 0.8497851670108503
Average runtime of 0.9 seconds
```
So it comes with a nice speedup as well :)
---------
Co-authored-by: Kevin Liu <[email protected]>1 parent 872a445 commit 36d383d
File tree
7 files changed
+805
-743
lines changed- pyiceberg
- io
- table
- tests
- benchmark
- integration
- table
7 files changed
+805
-743
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
| 33 | + | |
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
| |||
2174 | 2176 | | |
2175 | 2177 | | |
2176 | 2178 | | |
2177 | | - | |
| 2179 | + | |
| 2180 | + | |
| 2181 | + | |
| 2182 | + | |
2178 | 2183 | | |
2179 | 2184 | | |
2180 | 2185 | | |
| |||
2558 | 2563 | | |
2559 | 2564 | | |
2560 | 2565 | | |
2561 | | - | |
2562 | | - | |
2563 | | - | |
2564 | | - | |
2565 | | - | |
2566 | | - | |
2567 | | - | |
2568 | | - | |
2569 | | - | |
2570 | | - | |
2571 | | - | |
2572 | | - | |
2573 | | - | |
2574 | | - | |
2575 | | - | |
2576 | | - | |
2577 | | - | |
2578 | | - | |
2579 | | - | |
2580 | | - | |
2581 | | - | |
2582 | | - | |
2583 | | - | |
2584 | | - | |
2585 | | - | |
2586 | | - | |
2587 | | - | |
2588 | | - | |
2589 | | - | |
2590 | | - | |
2591 | 2566 | | |
2592 | | - | |
| 2567 | + | |
2593 | 2568 | | |
2594 | 2569 | | |
2595 | 2570 | | |
| |||
2598 | 2573 | | |
2599 | 2574 | | |
2600 | 2575 | | |
2601 | | - | |
2602 | | - | |
2603 | | - | |
2604 | | - | |
2605 | | - | |
2606 | | - | |
2607 | | - | |
2608 | | - | |
2609 | | - | |
2610 | | - | |
2611 | | - | |
| 2576 | + | |
| 2577 | + | |
| 2578 | + | |
2612 | 2579 | | |
2613 | | - | |
2614 | | - | |
2615 | | - | |
2616 | | - | |
2617 | | - | |
2618 | | - | |
2619 | | - | |
2620 | | - | |
2621 | | - | |
| 2580 | + | |
| 2581 | + | |
| 2582 | + | |
| 2583 | + | |
| 2584 | + | |
| 2585 | + | |
| 2586 | + | |
| 2587 | + | |
| 2588 | + | |
| 2589 | + | |
| 2590 | + | |
| 2591 | + | |
| 2592 | + | |
| 2593 | + | |
| 2594 | + | |
| 2595 | + | |
| 2596 | + | |
| 2597 | + | |
| 2598 | + | |
| 2599 | + | |
| 2600 | + | |
| 2601 | + | |
| 2602 | + | |
| 2603 | + | |
| 2604 | + | |
| 2605 | + | |
| 2606 | + | |
| 2607 | + | |
| 2608 | + | |
| 2609 | + | |
| 2610 | + | |
| 2611 | + | |
| 2612 | + | |
| 2613 | + | |
| 2614 | + | |
2622 | 2615 | | |
2623 | | - | |
2624 | | - | |
2625 | | - | |
2626 | | - | |
2627 | | - | |
2628 | | - | |
2629 | | - | |
2630 | | - | |
2631 | | - | |
2632 | | - | |
2633 | | - | |
2634 | | - | |
2635 | | - | |
2636 | | - | |
2637 | | - | |
2638 | | - | |
2639 | | - | |
2640 | | - | |
2641 | | - | |
2642 | | - | |
2643 | | - | |
2644 | | - | |
2645 | | - | |
2646 | | - | |
2647 | | - | |
2648 | | - | |
2649 | | - | |
| 2616 | + | |
| 2617 | + | |
| 2618 | + | |
| 2619 | + | |
| 2620 | + | |
2650 | 2621 | | |
2651 | 2622 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
393 | 394 | | |
394 | 395 | | |
395 | 396 | | |
396 | | - | |
| 397 | + | |
397 | 398 | | |
398 | 399 | | |
399 | 400 | | |
400 | 401 | | |
401 | 402 | | |
402 | 403 | | |
403 | | - | |
| 404 | + | |
404 | 405 | | |
405 | 406 | | |
406 | 407 | | |
| |||
427 | 428 | | |
428 | 429 | | |
429 | 430 | | |
430 | | - | |
431 | | - | |
432 | | - | |
| 431 | + | |
433 | 432 | | |
434 | 433 | | |
435 | 434 | | |
436 | 435 | | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
437 | 444 | | |
438 | 445 | | |
439 | 446 | | |
440 | 447 | | |
441 | 448 | | |
442 | | - | |
443 | | - | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
444 | 458 | | |
445 | 459 | | |
446 | 460 | | |
447 | | - | |
448 | | - | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
449 | 470 | | |
450 | 471 | | |
451 | 472 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
453 | 453 | | |
454 | 454 | | |
455 | 455 | | |
456 | | - | |
457 | | - | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
458 | 460 | | |
459 | 461 | | |
460 | 462 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1220 | 1220 | | |
1221 | 1221 | | |
1222 | 1222 | | |
| 1223 | + | |
1223 | 1224 | | |
1224 | 1225 | | |
1225 | 1226 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
0 commit comments