Refactor test suite in EnforceDistribution, to use standard test config.#15010
Refactor test suite in EnforceDistribution, to use standard test config.#15010alamb merged 2 commits intoapache:mainfrom
Conversation
| // By default, will not repartition / resort data if it is already sorted. | ||
| config.optimizer.prefer_existing_sort = false; | ||
|
|
||
| // By default, will attempt to convert Union to Interleave. | ||
| config.optimizer.prefer_existing_union = false; | ||
|
|
||
| // By default, will not repartition file scans. | ||
| config.optimizer.repartition_file_scans = false; | ||
| config.optimizer.repartition_file_min_size = 1024; | ||
|
|
||
| // By default, set query execution concurrency to 10. | ||
| config.execution.target_partitions = 10; |
There was a problem hiding this comment.
These match the defaults on main:
| // Use a small batch size, to trigger RoundRobin in tests | ||
| config.execution.batch_size = 1; |
There was a problem hiding this comment.
This matches main:
datafusion/datafusion/core/tests/physical_optimizer/enforce_distribution.rs
Lines 412 to 413 in c61e7e5
…est config builder
9af23e0 to
82b0342
Compare
| /// * `FIRST_ENFORCE_DIST` - | ||
| /// true: (EnforceDistribution, EnforceDistribution, EnforceSorting) | ||
| /// false: else runs (EnforceSorting, EnforceDistribution, EnforceDistribution) | ||
| /// * `PREFER_EXISTING_SORT` (optional) - if true, will not repartition / resort data if it is already sorted | ||
| /// * `TARGET_PARTITIONS` (optional) - number of partitions to repartition to | ||
| /// * `REPARTITION_FILE_SCANS` (optional) - if true, will repartition file scans | ||
| /// * `REPARTITION_FILE_MIN_SIZE` (optional) - minimum file size to repartition | ||
| /// * `PREFER_EXISTING_UNION` (optional) - if true, will not attempt to convert Union to Interleave |
There was a problem hiding this comment.
These are now replaced by the required DoFirst (for one-of 2 possible optimizer run patterns),
then all the optional settings.
There was a problem hiding this comment.
I very much like how it is much more explicit
alamb
left a comment
There was a problem hiding this comment.
Thank you very much @wiedld -- I think this is a major step forward in test readability as all the relevant options are now explicit in the tests, rather than being true / false parameters where interpreting them requires looking at a macro definition.
cc @berkaysynnada and @mustafasrepo / @akurmustafa
| /// * `FIRST_ENFORCE_DIST` - | ||
| /// true: (EnforceDistribution, EnforceDistribution, EnforceSorting) | ||
| /// false: else runs (EnforceSorting, EnforceDistribution, EnforceDistribution) | ||
| /// * `PREFER_EXISTING_SORT` (optional) - if true, will not repartition / resort data if it is already sorted | ||
| /// * `TARGET_PARTITIONS` (optional) - number of partitions to repartition to | ||
| /// * `REPARTITION_FILE_SCANS` (optional) - if true, will repartition file scans | ||
| /// * `REPARTITION_FILE_MIN_SIZE` (optional) - minimum file size to repartition | ||
| /// * `PREFER_EXISTING_UNION` (optional) - if true, will not attempt to convert Union to Interleave |
There was a problem hiding this comment.
I very much like how it is much more explicit
| assert_optimized!( | ||
| expected, | ||
| top_join.clone(), | ||
| &TestConfig::new(DoFirst::Distribution) | ||
| ); |
There was a problem hiding this comment.
Taking this a step more, what do you think of using a function on TestConfig so this looks like:
| assert_optimized!( | |
| expected, | |
| top_join.clone(), | |
| &TestConfig::new(DoFirst::Distribution) | |
| ); | |
| TestConfig::new(DoFirst::Distribution) | |
| .assert_optimized) | |
| &expected, | |
| &top_join) | |
| ); |
There was a problem hiding this comment.
I like this idea. I was already planning for the next followup PR to change the macro to a functional call.
There was a problem hiding this comment.
Here is the draft of the followup PR: influxdata#61
Just to get an idea of where this is heading.
|
I'll review this tomorrow as well |
|
BTW @blaginin is proposing to add If we merge that some of these tests may become much eaiser to maintain / change 🤔 |
|
Let's plan to merge this PR tomorrow / after @berkaysynnada has had a chance to review |
berkaysynnada
left a comment
There was a problem hiding this comment.
I really like the idea. Interpreting these tests becomes much clearer. Thank you @wiedld. I just have two questions, but perhaps I'm missing something as I'm a bit tired today 😅
| true, | ||
| repartition_size, | ||
| false | ||
| &test_config.clone().with_query_execution_partitions(3) |
There was a problem hiding this comment.
Are we missing prefer_existing_sort and repartition_file_scans here?
There was a problem hiding this comment.
I put the common config a few lines up here: https://github.com/apache/datafusion/pull/15010/files/5de55b3a8bf1858206dcecc0d3b1907228c970a4#r1984443646. Then only modified the unique config (with_query_execution_partitions) per test case.
| true, | ||
| repartition_size, | ||
| false | ||
| &test_config.with_query_execution_partitions(8) |
| let test_config = TestConfig::new(DoFirst::Distribution) | ||
| .with_prefer_existing_sort() | ||
| .with_prefer_repartition_file_scans(1); |
There was a problem hiding this comment.
Here's the common config.
That's my everyday. 😆 |
|
Thank you @berkaysynnada and @wiedld |
…est config builder (apache#15010)
Which issue does this PR close?
Part of #15003
Rationale for this change
Make it easier to determine the differences in test configuration, for each test case.
Also clearly defines what are the defaults used for the test suite.
What changes are included in this PR?
Define
TestConfig, which requires one-ofDoFirstoptimizer run configs. Most test cases uses this test config with the default settings.Provide a set of interfaces to define which configurations get tweaked.
Are these changes tested?
Yes
Are there any user-facing changes?
No