feat: add parquet nested leaf projection#7900
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new read_columns module to handle Parquet column projections, enabling support for nested field paths. It updates the Parquet reader to use leaf-based indexing instead of root-only indexing. A review comment suggests improving the build_parquet_leaves_indices function to correctly merge multiple projection requirements for the same root index, rather than overwriting them.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 956a6710d1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR introduces a parquet-specific “read columns” model that enables building projection masks at the leaf column level, which is necessary for more fine-grained pruning when parquet schemas contain nested types.
Changes:
- Added
ParquetReadColumns/ParquetReadColumnandbuild_parquet_leaves_indicesin a newparquet/read_columns.rsmodule. - Updated the parquet reader to construct
ProjectionMaskvia leaf indices (ProjectionMask::leaves) instead of root indices. - Added unit tests validating leaf-index expansion and nested-path filtering.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/mito2/src/sst/parquet/reader.rs |
Switches projection mask construction from root-based to leaf-based using the new read-columns helper. |
src/mito2/src/sst/parquet/read_columns.rs |
Implements parquet nested-path projection modeling and expands root selection into leaf column indices (with tests). |
src/mito2/src/sst/parquet.rs |
Exposes the new read_columns module under the parquet SST module tree. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
|
Codex Review: Didn't find any major issues. Delightful! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@evenyag PTAL. And shall we merge this after 1.0 merge window? |
I plan to merge it after we release 1.0. |
|
@evenyag sounds good. We need to create a maintenance branch for 1.0 |
9d3efa9 to
551ddec
Compare
|
@evenyag PTAL |
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR adds nested parquet leaf projection support in mito2 parquet reads.
Instead of building the projection mask from parquet root indices, it introduces a read-column abstraction and converts projections into parquet leaf indices before calling
ProjectionMask::leaves(...).PR Checklist
Please convert it to a draft if some of the following conditions are not met.