Restore Expression Peeling Flatten Optimization (PR #10521)？ #16899

zhli1142015 · 2026-03-24T11:31:55Z

zhli1142015
Mar 24, 2026
Collaborator

History

Optimize evaluation by reducing peeled vectors size #10521 (2024-07-23): Added flatten optimization in peelInternal() — when baseSize / 8 > rows.end(), deep-copies peeled vectors into compact flat vectors instead of keeping oversized dictionary wrapping. Reported 10x speedup.
Add document for flatten optimization in PeeledEncoding #10701 (2024-08-13): Added documentation comment to PeeledEncoding.h:35-38 describing this behavior.
Back out "Optimize evaluation by reducing peeled vectors size" #10742 (2024-08-13, same day): Reverted Optimize evaluation by reducing peeled vectors size #10521 due to a CSE (Common Subexpression Elimination) ABA bug — temporary flat copies could be freed and have their addresses reused, causing stale CSE cache hits. The comment from Add document for flatten optimization in PeeledEncoding #10701 was not reverted.
Ensure inputs are alive before re-using the results of a common sub-expression #10837 (2024-08-28): Independently fixed the CSE ABA root cause by adding weak_ptr expiry detection in evaluateSharedSubexpr() (Expr.h:660, Expr.cpp:914).

Current State

On main, PeeledEncoding.h:35-38 says:

When the base vector size is larger than 8 times of the selected rows, we do not save the dictionary wrapping. Instead, we store a flattened version...

But the code has VELOX_CHECK(wrapEncoding_ != FLAT) in both translateToInnerRows() and wrap(), which asserts FLAT is impossible. The comment is orphaned and misleading.

Motivation: Parquet Dictionary Columns

The default Parquet dictionary page size limit is 1 MB (Properties.h:210). For string columns with moderate-length values (e.g., 20–100 bytes), a single dictionary page can hold 10K–50K unique entries. When the reader produces a DictionaryVector, the base vector size equals the number of unique dictionary entries, while batch sizes are typically 1K–10K rows.

Unique entries (base)	Batch rows	base/8 > rows?	Without optimization
40,000 (avg 25B str)	1,000	5,000 > 1,000 ✅	Allocates 40K-element SelectivityVector + result vector
10,000 (avg 100B str)	1,000	1,250 > 1,000 ✅	Same waste pattern
10,000	10,000	1,250 < 10,000 ❌	Normal peeling (no waste)

Without the optimization, every expression evaluation on these columns allocates vectors at baseSize (10K–50K elements) when only the batch rows (1K) are needed — up to 50x memory over-allocation per eval.

Safety Argument

The only reason for the revert was the CSE ABA bug. PR #10837 fixed this root cause independently:

InputForSharedResults now stores weak_ptr alongside raw pointers (Expr.h:653)
evaluateSharedSubexpr() checks isExpired() before cache reuse (Expr.cpp:913-919)
When a flattened vector is freed and its address reused, weak_ptr::expired() returns true → stale entry evicted → fresh computation

Should we restore Expression Peeling Flatten Optimization (PR #10521)?

cc @Yuhta , @bikramSingh91 , @mbasmanova

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore Expression Peeling Flatten Optimization (PR #10521)？ #16899

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Restore Expression Peeling Flatten Optimization (PR #10521)？ #16899

Uh oh!

zhli1142015 Mar 24, 2026 Collaborator

History

Current State

Motivation: Parquet Dictionary Columns

Safety Argument

Replies: 0 comments

zhli1142015
Mar 24, 2026
Collaborator