Skip to content
This repository has been archived by the owner on Jul 25, 2022. It is now read-only.

Support reading from PyArrow datasets #10

Open
wjones127 opened this issue Jan 9, 2022 · 1 comment · May be fixed by #59
Open

Support reading from PyArrow datasets #10

wjones127 opened this issue Jan 9, 2022 · 1 comment · May be fixed by #59

Comments

@wjones127
Copy link

Given the success of the Datasets + DuckDB integration, a similar integration might be worthwhile in this module.

The datasets API allows taking filters and columns subset, and provides an iterator of Arrow record batches. I think that could be wrapped in a TableProvider, though I'm unclear how predicate pushdown is implemented in Datafusion.

@houqp
Copy link
Member

houqp commented Jan 9, 2022

Predicate pushdown is supported as an argument for the scan method, the doc you linked is out of date, you should see that argument in the latest version: https://docs.rs/datafusion/latest/datafusion/datasource/datasource/trait.TableProvider.html#tymethod.scan.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants