-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Part of #2079
Following on from #2292 and #2291 it should be possible to pull the multi-file handling out of each individual file operator, and delegate it to the physical plan. As described in #2079 this will greatly simplify the implementations, whilst also hiding fewer details from the physical plan.
Describe the solution you'd like
Currently a FileScanConfig would result ListingTable::scan generating a physical plan that looks something like
ParquetExec
I propose instead generating something like
UnionExec
ProjectionExec: ... // Partition 1
UnionExec
ParquetExec: ... // Partition 1 File 1
ParquetExec: ... // Partition 1 File 2
ProjectionExec: ... // Partition 2
UnionExec
ParquetExec: ... // Partition 2 File 1
ParquetExec: ... // Partition 2 File 2
ParquetExec: ... // Partition 2 File 3
Whilst this is more complex, it results in less complexity in the file format operators, and should hopefully lead to less bugs due to things like #2170 or #2000
Describe alternatives you've considered
We could not do this