Expose PyIceberg table as PyArrow Dataset

### Feature Request / Improvement

Migrated from https://github.com/apache/iceberg/issues/7598:

Hi, I've been looking at seeing what we can do to make PyArrow Datasets extensible for various table formats and making them consumable to various compute engines (including DuckDB, Polars, DataFusion, Dask). I've written up my observations here: https://docs.google.com/document/d/1r56nt5Un2E7yPrZO9YPknBN4EDtptpx-tqOZReHvq1U/edit?usp=sharing

## What this means for PyIceberg's API

Currently, integration with engines like DuckDB means filters and projections have to be specified up front, rather than pushed down from the query:

```python
con = table.scan(
    row_filter=GreaterThanOrEqual("trip_distance", 10.0),
    selected_fields=("VendorID", "tpep_pickup_datetime", "tpep_dropoff_datetime"),
).to_duckdb(table_name="distant_taxi_trips")
```

Ideally, we can export the table as a dataset, register it in DuckDB (or some other engine), and then filters and projections can be pushed down as the engine sees fit. Then the following would perform equivalent to the above, but would be more user friendly:

```python
dataset = table.to_pyarrow_dataset()
con.register(dataset, "distant_taxi_trips")
conn.sql(""""SELECT VendorID, tpep_pickup_datetime, tpep_dropoff_datetime
    FROM distant_taxi_trips
    WHERE trip_distance > 10.0""")
```


### Query engine

Other

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose PyIceberg table as PyArrow Dataset #30

Feature Request / Improvement

What this means for PyIceberg's API

Query engine

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expose PyIceberg table as PyArrow Dataset #30

Description

Feature Request / Improvement

What this means for PyIceberg's API

Query engine

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions