Asynchronous distributed planning

There are several future efforts that will require distributed planning to be able to hold asynchronous operations involving network calls:
- https://github.com/datafusion-contrib/datafusion-distributed/issues/374
- https://github.com/datafusion-contrib/datafusion-distributed/issues/376

Right now, distributed planning is implemented as `PhysicalOptimizerRule`, however, it's hard to model all the things we need to do in this project as optimizer rules, as we'll need to execute asynchronous operations during the modification of the single-node plan.

From a more philosophical, I don't think it's very worrying that distributed planning is not implemented as optimizer rules, as we are effectively not optimizing a query, so as long as the changes do not bleed to users and can be kept internally in this project it should be fine.

### Implementation details

I don't think there's any other option besides lazily distributing a query during execution, which is similar to what we do already for worker assignation in `DistributedExec`, so we can keep the same design, extending it to also perform distributed planning:
1. The `DistributedPhsysicalOptimizerRule` will just place a `DistributedExec` on top of the plan, but nothing else
2. The existing content of `DistributedPhsysicalOptimizerRule` can be just moved out to its own function, and made asynchronous (`async distributed_plan(plan: Arc<dyn ExecutionPlan>) -> Result<Arc<dyn ExecutionPlan>>`)
3. The `async distribute_plan()` function can be called either:
    - Lazily right before `DistributedExec::prepare_plan()`
    - By the users themselves, just in case they are interested in inspecting the distributed plan before execution

I'd expect no impact of this change neither in functionality or performance, just the ability to make the `annotate_plan` and `distribute_plan` functions asynchronous.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronous distributed planning #382

Implementation details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Asynchronous distributed planning #382

Description

Implementation details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions