Conversation
|
Unfortunately this has been hit by another wave of conflicts, but I just released 3.0.0, so there will be a bit of a freeze of activity from this point on. |
|
no worries! let me know when you want me to restart working on this/when you are ready to merge things in again. Also feel free to ping me on any tickets/features, etc.. happy to help with whatever (lsm tree or fjall itself) |
|
At this point 3.0 has stabilized I think. I'm definitely keen on getting prefix extractors and compaction filters in as the next major features. |
fc17be2 to
2df2ae4
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
2df2ae4 to
58ad234
Compare
|
I will soon do a more in-depth look into this PR but in the mean time: the run_reader logic is mostly not covered by tests, so I think there are still edge cases that are missing in tests. Other files are not as affected or even improve in coverage, so that's good. |
|
sounds good, i'll add more tests to cover the run_reader logic |
af5ac86 to
c1ea699
Compare
|
note there is some false positives like: https://app.codecov.io/gh/fjall-rs/lsm-tree/pull/186#644ae531cb268487817af88f68673c70-R56 where the doc comments are showing up as "untested" |
c1ea699 to
eaec760
Compare
That makes sense because the extractors are never actually asserted to work correctly. Adding something like assert_eq!(..., SegmentedPrefixExtractor.name());
assert!(..., SegmentedPrefixExtractor.extract(...));should fix it. |
|
sounds good, i'll add that but i'll wait for your other feedback on this PR and i address it all in one go. |
There was a problem hiding this comment.
This file is probably at the point where it could be split into multiple smaller files. But I can do that later.
1567681 to
ea18d59
Compare
Add prefix-aware filter support to the LSM-tree. When a prefix extractor is configured, extracted prefixes are added to Bloom filters and the extractor name is stored in table metadata. Point reads use a prefix pre-check (maybe_contains_prefix) before falling back to the full-key filter. Range scans skip tables whose prefix filter definitively excludes the query range, both upfront and lazily during iteration. Includes a fix for PartitionedFilterWriter::finish which panicked when no filter partitions were created (e.g. all keys shorter than the required prefix length). The empty tli_handles guard returns early instead of attempting to encode an empty top-level index.
Oracle-based differential fuzzer: runs the same AFL-derived operation sequence against two trees (one with prefix extractor, one without) and asserts all reads return identical results. Any mismatch = wrongly applied filter = silent data loss, saved by AFL as a crash for replay. Covers all identified correctness dimensions: - 9 extractor variants × 3 bpk levels × 3 filter partitioning policies - MVCC snapshot reads at older seqnos while writes continue - Weak tombstones and their compaction GC interaction - Extractor changes on reopen (prefix_filter_allowed compatibility) - Partitioned filter forced on all levels (the path that had the panic) - Bidirectional iterator stepping (PrefixPingPong) - Unbounded iteration (FirstKV/LastKV) - Clustered keys (first byte 0..7, len 1..9) for realistic prefix distribution with natural in-domain / out-of-domain key mix
7f9fffc to
ee30eba
Compare
rebased #151 on top of main and adapted various things to the new api