Commit a67cd19
authored
# Rationale for this change
When dealing with Parquet files that have an exceedingly large amount of
Binary or UTF8 data in one row group, there can be issues when returning
a single RecordBatch because of index overflows
(#7973).
In `pyarrow` this is usually solved by representing data as a
`pyarrow.Table` object whose columns are `ChunkedArray`s, which
basically are just lists of Arrow Arrays, or alternatively, the
`pyarrow.Table` is just a representation of a list of `RecordBatch`es.
I'd like to build a function in PyO3 that returns a `pyarrow.Table`,
very similar to [pyarrow's read_row_group
method](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetFile.html#pyarrow.parquet.ParquetFile.read_row_group).
With that, we could have feature parity with `pyarrow` in circumstances
of potential index overflows without resorting to type changes (such as
reading the data as `LargeString` or `StringView` columns).
Currently, AFAIS, there is no way in `arrow-pyarrow` to export a
`pyarrow.Table` directly. Especially convenience methods from
`Vec<RecordBatch>` seem to be missing. This PR tries to implement a
convenience wrapper that allows directly exporting `pyarrow.Table`.
# What changes are included in this PR?
A new struct `Table` in the crate `arrow-pyarrow` is added which can be
constructed from `Vec<RecordBatch>` or from `ArrowArrayStreamReader`.
It implements `FromPyArrow` and `IntoPyArrow`.
`FromPyArrow` will support anything that either implements the
ArrowStreamReader protocol or is a RecordBatchReader, or has a
`to_reader()` method which does that. `pyarrow.Table` does both of these
things.
`IntoPyArrow` will result int a `pyarrow.Table` on the Python side,
constructed through `pyarrow.Table.from_batches(...)`.
# Are these changes tested?
Yes, in `arrow-pyarrow-integration-tests`.
# Are there any user-facing changes?
A new `Table` convience wrapper is added!
1 parent ce4edd5 commit a67cd19
File tree
3 files changed
+223
-20
lines changed- arrow-pyarrow-integration-testing
- src
- tests
- arrow-pyarrow/src
3 files changed
+223
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
140 | 140 | | |
141 | 141 | | |
142 | 142 | | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
143 | 163 | | |
144 | 164 | | |
145 | 165 | | |
| |||
178 | 198 | | |
179 | 199 | | |
180 | 200 | | |
| 201 | + | |
| 202 | + | |
181 | 203 | | |
182 | 204 | | |
183 | 205 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
130 | 131 | | |
131 | 132 | | |
132 | 133 | | |
133 | | - | |
134 | | - | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
135 | 158 | | |
136 | 159 | | |
137 | | - | |
| 160 | + | |
138 | 161 | | |
139 | 162 | | |
140 | 163 | | |
141 | | - | |
142 | | - | |
| 164 | + | |
| 165 | + | |
143 | 166 | | |
144 | 167 | | |
145 | | - | |
146 | | - | |
| 168 | + | |
| 169 | + | |
147 | 170 | | |
148 | 171 | | |
149 | | - | |
150 | | - | |
| 172 | + | |
| 173 | + | |
151 | 174 | | |
152 | 175 | | |
153 | | - | |
154 | | - | |
| 176 | + | |
| 177 | + | |
155 | 178 | | |
156 | 179 | | |
157 | 180 | | |
| |||
632 | 655 | | |
633 | 656 | | |
634 | 657 | | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
635 | 719 | | |
636 | 720 | | |
637 | 721 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
58 | 61 | | |
59 | 62 | | |
60 | 63 | | |
| |||
68 | 71 | | |
69 | 72 | | |
70 | 73 | | |
71 | | - | |
| 74 | + | |
72 | 75 | | |
73 | 76 | | |
74 | | - | |
75 | 77 | | |
76 | 78 | | |
77 | | - | |
| 79 | + | |
| 80 | + | |
78 | 81 | | |
79 | 82 | | |
80 | 83 | | |
| |||
484 | 487 | | |
485 | 488 | | |
486 | 489 | | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
487 | 584 | | |
488 | 585 | | |
489 | 586 | | |
| |||
0 commit comments