Skip to content
This repository has been archived by the owner on Jul 25, 2022. It is now read-only.

Should PyDataFrame.collect() return a Table? #23

Open
wjones127 opened this issue Feb 20, 2022 · 3 comments
Open

Should PyDataFrame.collect() return a Table? #23

wjones127 opened this issue Feb 20, 2022 · 3 comments

Comments

@wjones127
Copy link

Right now it returns List[pa.RecordBatch], but it might be more natural to return a pa.Table. For one thing, they have a better repr provided by PyArrow.

@matthewmturner
Copy link
Contributor

Asides from repr, do you see any other advantages?

@houqp
Copy link
Member

houqp commented Feb 20, 2022

This is to keep the signature in sync with what we have in the Rust core. Perhaps it would be better to add a new method to return a pa.Table instead.

@wjones127
Copy link
Author

Asides from repr, do you see any other advantages?

Mostly was just surprised coming from PyArrow, but it sounds like Rust usually just represents results as a sequence of record batches.

Perhaps it would be better to add a new method to return a pa.Table instead.

Yeah perhaps that's a better path. A to_table() method is common in PyArrow. If we eventually get the C Streaming data interface implemented in arrow-rs, we could also provide a to_reader().

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants