Add documentation for support for skipping Parquet row groups

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**

We sometimes get questions about support for skipping Parquet row groups based on statistics. It seems that we do not have good documentation around this really cool feature, so we should write something up. We can base it on this response copied from the slack channel.

```
DataFusion has support for skipping entire row groups using predicates and min and max statistics.

It does not (yet) push the predicates down into the actual scan (e.g. to avoid materializing data that wouldn’t pass the predicate) — instead any row groups that are not pruned are decompressed into RecordBatches and then filtered.
Also, DataFusion will do “projection pushdown” — aka it will read only those columns needed to answer the query.
```

**Describe the solution you'd like**
Promote this cool feature in the documentation somewhere (user guide? README?)

**Describe alternatives you've considered**
None

**Additional context**
None


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add documentation for support for skipping Parquet row groups #825

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add documentation for support for skipping Parquet row groups #825

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions