-
Notifications
You must be signed in to change notification settings - Fork 1.7k
use bloom filters to push down hash table lookups in HashJoinExec #18307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
67efa2b to
7b020d8
Compare
|
Preliminary results: Where q.sql is basically https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/#hash-join-dynamic-filters |
|
I do wonder if using the same |
|
I also did try a couple different sizes of |
|
I think I've figured out how to make the bloom filters very, very cheap to build: re-use the hashes calculated for the hash table so that the only thing we ever insert into the bloom filter are |
I've implemented this. I think it could be improved further using the |
This is a WIP PR to explore a change and run benchmarks, it is not intended to be in a reviewable state
Most of this code was AI generated and needs careful human review before being mergeable