Skip to content

feat(external): skip Parquet RowGroups via min/max statistics filtering#23754

Open
flypiggyyoyoyo wants to merge 11 commits intomatrixorigin:mainfrom
flypiggyyoyoyo:issue-23703
Open

feat(external): skip Parquet RowGroups via min/max statistics filtering#23754
flypiggyyoyoyo wants to merge 11 commits intomatrixorigin:mainfrom
flypiggyyoyoyo:issue-23703

Conversation

@flypiggyyoyoyo
Copy link
Collaborator

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #23703

What this PR does / why we need it:

ParquetReader currently decodes every RowGroup regardless of WHERE conditions. This PR adds RowGroup-level predicate pushdown: before reading each RowGroup, we evaluate the filter expression against ColumnChunk min/max statistics using EvaluateFilterByZoneMap. RowGroups that cannot match are skipped entirely, avoiding unnecessary page decoding.

Key changes:

  • New file parquet_filter.go: bridges Parquet ColumnChunk statistics to MO's ZoneMap evaluation via parquetColumnMetaFetcher adapter
  • ParquetHandler.prepare(): initializes RowGroup list and filter column mapping; defers page opening when filtering is possible
  • ParquetHandler.getData(): routes to getDataByRowGroup() when canFilter=true, which skips non-matching RowGroups before opening their pages
  • Supported filter types: bool, int8–64, uint8–64, float32/64, date, timestamp/datetime. String types fall back to no filtering (TODO)
  • Unit tests (22) and e2e tests (multi-RowGroup parquet file with various WHERE patterns)

@matrix-meow matrix-meow added the size/XL Denotes a PR that changes [1000, 1999] lines label Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/wip kind/enhancement size/XL Denotes a PR that changes [1000, 1999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants