Skip to content

Add prefix filter support#186

Open
zaidoon1 wants to merge 2 commits intofjall-rs:mainfrom
zaidoon1:zaidoon/prefix-filter
Open

Add prefix filter support#186
zaidoon1 wants to merge 2 commits intofjall-rs:mainfrom
zaidoon1:zaidoon/prefix-filter

Conversation

@zaidoon1
Copy link
Contributor

@zaidoon1 zaidoon1 commented Nov 3, 2025

rebased #151 on top of main and adapted various things to the new api

@marvin-j97
Copy link
Contributor

Unfortunately this has been hit by another wave of conflicts, but I just released 3.0.0, so there will be a bit of a freeze of activity from this point on.

@zaidoon1
Copy link
Contributor Author

zaidoon1 commented Jan 3, 2026

no worries! let me know when you want me to restart working on this/when you are ready to merge things in again. Also feel free to ping me on any tickets/features, etc.. happy to help with whatever (lsm tree or fjall itself)

@marvin-j97
Copy link
Contributor

At this point 3.0 has stabilized I think. I'm definitely keen on getting prefix extractors and compaction filters in as the next major features.

@zaidoon1 zaidoon1 force-pushed the zaidoon/prefix-filter branch from fc17be2 to 2df2ae4 Compare February 10, 2026 20:30
@codecov
Copy link

codecov bot commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 95.46926% with 28 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/run_reader.rs 93.02% 12 Missing ⚠️
src/table/mod.rs 92.62% 9 Missing ⚠️
src/table/writer/filter/partitioned.rs 73.68% 5 Missing ⚠️
src/blob_tree/mod.rs 96.96% 1 Missing ⚠️
src/range.rs 95.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@zaidoon1 zaidoon1 force-pushed the zaidoon/prefix-filter branch from 2df2ae4 to 58ad234 Compare February 10, 2026 20:33
@marvin-j97
Copy link
Contributor

I will soon do a more in-depth look into this PR but in the mean time: the run_reader logic is mostly not covered by tests, so I think there are still edge cases that are missing in tests. Other files are not as affected or even improve in coverage, so that's good.

@zaidoon1
Copy link
Contributor Author

sounds good, i'll add more tests to cover the run_reader logic

@zaidoon1 zaidoon1 force-pushed the zaidoon/prefix-filter branch 3 times, most recently from af5ac86 to c1ea699 Compare February 11, 2026 17:46
@zaidoon1
Copy link
Contributor Author

note there is some false positives like: https://app.codecov.io/gh/fjall-rs/lsm-tree/pull/186#644ae531cb268487817af88f68673c70-R56 where the doc comments are showing up as "untested"

@zaidoon1 zaidoon1 force-pushed the zaidoon/prefix-filter branch from c1ea699 to eaec760 Compare February 11, 2026 18:33
@marvin-j97
Copy link
Contributor

marvin-j97 commented Feb 12, 2026

note there is some false positives like: https://app.codecov.io/gh/fjall-rs/lsm-tree/pull/186#644ae531cb268487817af88f68673c70-R56 where the doc comments are showing up as "untested"

That makes sense because the extractors are never actually asserted to work correctly.

Adding something like

assert_eq!(..., SegmentedPrefixExtractor.name());

assert!(..., SegmentedPrefixExtractor.extract(...));

should fix it.

@zaidoon1
Copy link
Contributor Author

sounds good, i'll add that but i'll wait for your other feedback on this PR and i address it all in one go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is probably at the point where it could be split into multiple smaller files. But I can do that later.

Add prefix-aware filter support to the LSM-tree. When a prefix extractor
is configured, extracted prefixes are added to Bloom filters and the
extractor name is stored in table metadata. Point reads use a prefix
pre-check (maybe_contains_prefix) before falling back to the full-key
filter. Range scans skip tables whose prefix filter definitively excludes
the query range, both upfront and lazily during iteration.

Includes a fix for PartitionedFilterWriter::finish which panicked when
no filter partitions were created (e.g. all keys shorter than the
required prefix length). The empty tli_handles guard returns early
instead of attempting to encode an empty top-level index.
Oracle-based differential fuzzer: runs the same AFL-derived operation
sequence against two trees (one with prefix extractor, one without) and
asserts all reads return identical results. Any mismatch = wrongly
applied filter = silent data loss, saved by AFL as a crash for replay.

Covers all identified correctness dimensions:
- 9 extractor variants × 3 bpk levels × 3 filter partitioning policies
- MVCC snapshot reads at older seqnos while writes continue
- Weak tombstones and their compaction GC interaction
- Extractor changes on reopen (prefix_filter_allowed compatibility)
- Partitioned filter forced on all levels (the path that had the panic)
- Bidirectional iterator stepping (PrefixPingPong)
- Unbounded iteration (FirstKV/LastKV)
- Clustered keys (first byte 0..7, len 1..9) for realistic prefix
  distribution with natural in-domain / out-of-domain key mix
@zaidoon1 zaidoon1 force-pushed the zaidoon/prefix-filter branch from 7f9fffc to ee30eba Compare February 14, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants