IN LIST: add UInt16 bitmap filter#23012
Conversation
55f3836 to
81ec379
Compare
7043d4b to
2dbce01
Compare
56f08ef to
080b1b4
Compare
## Which issue does this PR close? - Part of apache#19241. - Stacked on apache#21927. - Next in stack: apache#23012. - Extracted from apache#19390. ## Rationale for this change `IN LIST` evaluates expressions like `x IN (1, 3, 7)`. The list on the right is fixed, so DataFusion can precompute a small lookup structure once and then reuse it for every input row. For `UInt8`, there are only 256 possible values: 0 through 255. That means the lookup can be a tiny checklist with one bit per possible value: - If the list contains `3`, set bit `3`. - If the list contains `7`, set bit `7`. - To check whether an input value is present, read that one bit. So instead of hashing each input value or comparing it against the list, membership becomes one indexed bit test. The bitmap is only 32 bytes, because 256 bits = 32 bytes. This PR adds the first specialized primitive path in the stack as a concrete `UInt8` filter. The `UInt16` version is added in apache#23012, and the shared bitmap abstraction is introduced only after both concrete implementations are visible in apache#23035. ## What changes are included in this PR? - Adds `UInt8BitmapFilter`, a 32-byte bitmap built from the non-null constants in the `IN` list. - Routes `UInt8` constant-list filtering to that bitmap path. - Keeps the same SQL null behavior as the generic path for both `IN` and `NOT IN`. - Moves shared dictionary-needle handling into `static_filter.rs`, so specialized filters can reuse it consistently. - Adds focused tests for `UInt8` null handling and dictionary-encoded needles. ## Are these changes tested? Yes. - `cargo fmt --all` - `cargo test -p datafusion-physical-expr bitmap_filter_u8 --lib` - `cargo test -p datafusion-physical-expr in_list_int_types --lib` - `cargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warnings` ## Are there any user-facing changes? No. This is an internal performance optimization only. <!-- codex-benchmark-start --> ## Local benchmark snapshot Benchmark command: ```bash cargo bench -p datafusion-physical-expr --profile release-nonlto --bench in_list_strategy -- --save-baseline <name> ``` Method: compare adjacent saved baselines using raw Criterion sample minima (`min(time / iters)`). Lower is better; changes within +/-5% are treated as noise. These numbers were not rerun after splitting the bitmap abstraction into apache#23035. Compared baselines: [apache#21927](apache#21927) -> [apache#23011](apache#23011) Relevant scope: UInt8 narrow-integer rows. Summary: 5 relevant rows, 5 faster, 0 slower, 0 within +/-5%. | Benchmark | Before | After | Change | |---|---:|---:|---:| | `narrow_integer/u8/list=16/match=0%` | 20.39 us | 3.94 us | -80.7% (5.18x faster) | | `narrow_integer/u8/list=16/match=50%` | 38.38 us | 3.98 us | -89.6% (9.65x faster) | | `narrow_integer/u8/list=4/match=0%` | 18.18 us | 3.93 us | -78.4% (4.62x faster) | | `narrow_integer/u8/list=4/match=50%` | 34.63 us | 3.96 us | -88.6% (8.75x faster) | | `nulls/narrow_integer/u8/list=16/match=50%/nulls=20%` | 37.12 us | 4.16 us | -88.8% (8.93x faster) | <!-- codex-benchmark-end --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Implements an 8 KB heap-allocated bitmap for UInt16. Maintains O(1) performance while handling the larger value space. Triggers for UInt16 arrays.
080b1b4 to
85133a6
Compare
|
run benchmark in_list_strategy |
alamb
left a comment
There was a problem hiding this comment.
Looks good to me -- thank you @geoffreyclaude
|
|
||
| /// Bitmap filter for O(1) `UInt16` set membership via single bit test. | ||
| /// | ||
| /// `UInt16` has 65,536 possible values, so the filter stores membership in an |
There was a problem hiding this comment.
the only question I have is is it worth 8KB for a small inlist -- e.g. if there are 3 elements, 8kb may be a lot of memory overhead, though perhaps the performance is worth it
There was a problem hiding this comment.
Actually the branchless strategy (see later PR in the stack) wins against bitmaps on lists of sizes up to 8 (both for u8 and u16.)
I still need to amend it though, as it currently skips 1 and 2 byte types.
There was a problem hiding this comment.
sounds like we hav a plan -- let's keep going then!
There was a problem hiding this comment.
One PR at a time! This is much cleaner and trackable than everything in a single mega PR for sure
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf/in_list_bitmap_u16_filter (85133a6) to e2c3e18 (merge-base) diff using: in_list_strategy File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagein_list_strategy — base (merge-base)
in_list_strategy — branch
File an issue against this benchmark runner |
Which issue does this PR close?
INperformance with specialized implementations #19390.Rationale for this change
#23011 uses a bitmap checklist for
UInt8, where there are 256 possible values.UInt16is the same idea with a larger value range: 0 through 65,535.That is still small enough to represent directly. A
UInt16bitmap needs one bit for each possible value:Then a lookup is still simple: use the input value as the bit position and check whether that bit is set. For example, if the list contains
42, bit42is set, and every input row with value42can be recognized with one bit test.This PR keeps the scope narrow: it adds the unsigned 2-byte bitmap path as a concrete
UInt16filter. #23035 then unifies theUInt8andUInt16implementations, and #23013 uses that shared shape for signed same-width reinterpretation.What changes are included in this PR?
UInt16BitmapFilter, backed by a heap-allocated 65,536-bit bitmap.UInt16constant-list filtering to that bitmap path.IN/NOT INnull behavior as the generic path.UInt16boundary values, nulls, andNOT IN.Are these changes tested?
Yes.
cargo fmt --allcargo test -p datafusion-physical-expr bitmap_filter_u16 --libcargo test -p datafusion-physical-expr in_list_int_types --libcargo test -p datafusion-physical-expr test_in_list_from_array_type_combinations --libcargo test -p datafusion-physical-expr test_in_list_dictionary_types --libcargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warningsAre there any user-facing changes?
No. This is an internal performance optimization only.
Benchmark note
No local
in_list_strategynumbers are included for this PR because the benchmark harness does not currently include a directUInt16case. The availablei16rows measure the signed reinterpretation path added in #23013 after the bitmap unification in #23035, not this PR's unsignedUInt16bitmap filter.