-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add view support to the Rest Catalog #818
Comments
Thank you for raising this @ndrluis 💯 I will add this as a 0.8.0 milestone for now |
Would love to take a first stab at this @kevinjqliu, could you assign this to me? edit: here's a PR for |
I am really curious about how @shiv-io did you already have some thoughts there? |
Following what @danielcweeks said in this email, I believe we could discuss and experiment with SQLGlot to create support for other dialects. However, to support load views, we likely need to rely on a query engine. I'm not sure if there is a query engine in the Python ecosystem that would make sense to support, but I feel that we could use Apache DataFusion through the iceberg-rust implementation or the Python bindings. |
That's an interesting question @corleyma . The way I see it, PyIceberg is a language library, that tries to remain open to any Python based query engine that wants to make use of its functions to process Iceberg tables. So I think the first step in introducing view support in PyIceberg would be for us to fetch the view representations from the REST Catalog endpoint and serve the view representations to any query engines that want to integrate with it (like Daft). I agree with @ndrluis though, that it would be cool to leverage projects like DataFusion to improve the way we load, slice and dice the tables in PyIceberg. |
I agree with @sungwy that the primary goal of pyiceberg should be to make it possible for query engines to interface with Iceberg tables and views. Nonetheless, it would be really ideal to have some out of the box way to get a scan of a view (PyArrow Dataset-like is the most ideal, but returning Table/RecordBatchReader like current table scan functionality is a fine endpoint). This is ideal because it provides an easy path for integrating with other things (like polars) that currently support pyiceberg tables, and because it will benefit use of pyiceberg for more operational concerns e.g. being able to easily preview view contents, etc. I think DataFusion (either via Python bindings or via iceberg-rust) would be a great way to accomplish this goal. Since (I think?) pyiceberg is much further along in implementing the iceberg sdk than iceberg-rust, it would be interesting if it were possible for pyiceberg to use DataFusion directly but I suspect you need some custom rust code no matter what? |
I'm fairly new to the Iceberg ecosystem -- thanks for the insightful discussion, looks like I have some reading to do before I can weigh in.
|
@shiv-io It should still be possible to do look at how |
+1, I think it's a good idea to separate accessing the iceberg views from using them. The ability to read an iceberg view is great for general view operations. Even printing out what the view definition is would be a great feature to have. Connecting the view with an external engine can be a separate story. |
Feature Request / Improvement
Reference: https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml
The text was updated successfully, but these errors were encountered: