[DISCUSS] Set DataFusion settings for maximum "out of the box" performance 

### Is your feature request related to a problem or challenge?
I want people's first impression of DataFusion to be "that is very fast" without having to tune parameters

DataFusion has many configuration options that control various performance optimization.

There is a tradeoff between some of these options between faster query execution (for more than linear resource consumption) and pure efficiency. 

We have benchmarks such as a [tpch](https://github.com/apache/arrow-datafusion/blob/main/benchmarks/src/bin/tpch.rs) runner that typically run with a single core. These are great as performance unit tests in well controlled environments (and avoid task overhead, and other non determinism introduced with multi-core execution), however they don't mimic what users typically run with. 

Up to now we have taken a conservative approach and only enabled optimizations by default if they make everything faster.  I would like to change our philosophy and optimize for "out of the box" performance

You can see examples of other systems tuning knobs up for performance:

IOx: https://github.com/influxdata/influxdb_iox/blob/ad28ebb7650d1cd21b995ee5b514d8da3580f22b/datafusion_util/src/config.rs#L20-L23



Here is a recent example from https://hussainsultan.com/posts/unbundled-datafusion/

```python
runtime = RuntimeConfig().with_disk_manager_os().with_fair_spill_pool(100000000)
config = (
    SessionConfig()
    .with_create_default_catalog_and_schema(True)
    .with_target_partitions(8)
    .with_information_schema(True)
    .with_repartition_joins(True)
    .with_repartition_aggregations(True)
    .with_repartition_windows(True)
    .with_parquet_pruning(True)
    .set("datafusion.execution.parquet.pushdown_filters", "true")
)

ctx = SessionContext(config, runtime)
ctx.register_parquet("orders", "../../../fanniemae-benchmark/sf10/raw/orders.parquet")
```






### Describe the solution you'd like

I would like to change the `ConfigOption` defaults to optimize performance in the common case rather than avoid 
 performance regressions in all cases


Specifically that means:
1. Repartition always when possible (to increase parallelism by default)
2. Push down all parquet filters  (e.g. https://github.com/apache/arrow-datafusion/issues/4085 and https://github.com/apache/arrow-datafusion/issues/3463)





### Describe alternatives you've considered

We can leave the defaults alone and make users change the defaults

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DISCUSS] Set DataFusion settings for maximum "out of the box" performance #6287

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DISCUSS] Set DataFusion settings for maximum "out of the box" performance #6287

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions