Skip to content

feat: add parquet nested leaf projection#7900

Merged
fengys1996 merged 12 commits intoGreptimeTeam:mainfrom
fengys1996:feat/arquet-nested-leaf-projection
Apr 10, 2026
Merged

feat: add parquet nested leaf projection#7900
fengys1996 merged 12 commits intoGreptimeTeam:mainfrom
fengys1996:feat/arquet-nested-leaf-projection

Conversation

@fengys1996
Copy link
Copy Markdown
Contributor

@fengys1996 fengys1996 commented Apr 1, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR adds nested parquet leaf projection support in mito2 parquet reads.

Instead of building the projection mask from parquet root indices, it introduces a read-column abstraction and converts projections into parquet leaf indices before calling ProjectionMask::leaves(...).

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@fengys1996 fengys1996 requested review from Copilot and removed request for evenyag, v0y4g3r and waynexia April 1, 2026 04:16
@github-actions github-actions bot added size/S docs-not-required This change does not impact docs. labels Apr 1, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new read_columns module to handle Parquet column projections, enabling support for nested field paths. It updates the Parquet reader to use leaf-based indexing instead of root-only indexing. A review comment suggests improving the build_parquet_leaves_indices function to correctly merge multiple projection requirements for the same root index, rather than overwriting them.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 956a6710d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a parquet-specific “read columns” model that enables building projection masks at the leaf column level, which is necessary for more fine-grained pruning when parquet schemas contain nested types.

Changes:

  • Added ParquetReadColumns / ParquetReadColumn and build_parquet_leaves_indices in a new parquet/read_columns.rs module.
  • Updated the parquet reader to construct ProjectionMask via leaf indices (ProjectionMask::leaves) instead of root indices.
  • Added unit tests validating leaf-index expansion and nested-path filtering.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/mito2/src/sst/parquet/reader.rs Switches projection mask construction from root-based to leaf-based using the new read-columns helper.
src/mito2/src/sst/parquet/read_columns.rs Implements parquet nested-path projection modeling and expands root selection into leaf column indices (with tests).
src/mito2/src/sst/parquet.rs Exposes the new read_columns module under the parquet SST module tree.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fengys1996
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@fengys1996 fengys1996 requested a review from a team as a code owner April 2, 2026 02:54
@sunng87
Copy link
Copy Markdown
Member

sunng87 commented Apr 2, 2026

@evenyag PTAL.

And shall we merge this after 1.0 merge window?

@evenyag
Copy link
Copy Markdown
Contributor

evenyag commented Apr 2, 2026

@evenyag PTAL.

And shall we merge this after 1.0 merge window?

I plan to merge it after we release 1.0.

@sunng87
Copy link
Copy Markdown
Member

sunng87 commented Apr 2, 2026

@evenyag sounds good. We need to create a maintenance branch for 1.0

@fengys1996 fengys1996 force-pushed the feat/arquet-nested-leaf-projection branch from 9d3efa9 to 551ddec Compare April 7, 2026 09:48
@github-actions github-actions bot added size/M and removed size/S labels Apr 7, 2026
@fengys1996 fengys1996 requested a review from evenyag April 7, 2026 10:07
@killme2008
Copy link
Copy Markdown
Member

@evenyag PTAL

Copy link
Copy Markdown
Contributor

@evenyag evenyag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@fengys1996 fengys1996 enabled auto-merge April 10, 2026 09:39
@fengys1996 fengys1996 added this pull request to the merge queue Apr 10, 2026
Merged via the queue into GreptimeTeam:main with commit 76cad69 Apr 10, 2026
45 of 46 checks passed
@fengys1996 fengys1996 deleted the feat/arquet-nested-leaf-projection branch April 10, 2026 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants