feat(example): add a kv example based on fjall kv store#1658
feat(example): add a kv example based on fjall kv store#1658ariesdevil wants to merge 1 commit intodatabendlabs:mainfrom
Conversation
5bf7453 to
49de246
Compare
49de246 to
ea0e770
Compare
drmingdrmer
left a comment
There was a problem hiding this comment.
@drmingdrmer reviewed 4 files and all commit messages, and made 1 comment.
Reviewable status: 4 of 18 files reviewed, 1 unresolved discussion (waiting on ariesdevil).
examples/raft-kv-fjall/src/log_store.rs line 185 at r1 (raw file):
for k in start_index..10_000 { self.keyspace_logs().remove(id_to_bin(k)).map_err(|e| io::Error::other(e.to_string()))?; }
you must not truncate a log from the left boundary if the server crashed it will leave a hole in the logs that is forbidden. This has been documented in the trait method.
Code quote:
for k in start_index..10_000 {
self.keyspace_logs().remove(id_to_bin(k)).map_err(|e| io::Error::other(e.to_string()))?;
}|
Thanks for the contribution! A fjall-based storage backend is a great idea — pure Rust, LSM-tree, faster compile times compared to RocksDB. That addresses a real pain point. One suggestion on scope: rather than a full KV application example (with HTTP API, network layer,
No HTTP server, no CLI binary, no network glue. The storage test suite ( The reason: full application examples tend to require more ongoing maintenance (dependency updates, API compatibility, network stack changes), and they duplicate what Happy to help shape it if you want to go that route. |
|
Another consideration worth thinking about: log storage and state machine storage are independent in openraft, and they serve very different roles. Log storage is on the critical path of consensus — every append, vote, and commit goes through it, so it needs low write latency and sequential append performance. LSM-tree engines like fjall are optimized for write throughput and point lookups, but the compaction overhead and write amplification make them a less natural fit for a Raft log compared to append-oriented storage. The state machine, on the other hand, is much less latency-sensitive from the consensus perspective. Its only hard requirement for consensus is the ability to produce a snapshot. An LSM-tree engine is actually a good fit here: random reads and writes, key-value lookups, compaction — all align well with typical state machine workloads. There is already a standalone state machine example at Have you considered implementing fjall as a |
If the writes are strictly sequential, fjall does not actually compact anything, it just uses trivial moves. RocksDB should be doing the same. Even then, as long as the write workload is not way too intensive, compactions don't affect latencies that much (especially when your writes are synchronous). The base write amp (without compaction) is ~2x, which is really not that much. |
Sure, thx. |
Fair point on the compaction side — for a strictly sequential append workload, trivial moves do apply and compaction pressure stays low. But there is a deeper structural issue beyond compaction: fjall has its own WAL (it calls it a "journal"). Looking at the source, it sits in This creates a durability accounting problem for Raft log storage. Raft requires that once For a state machine this does not matter at all, since the state machine can always be rebuilt by replaying the log. Raft never requires the state machine to be durable — only the log must be. That is exactly why fjall looks like a strong fit for |
What's the alternative? Writing to a file manually has the same fsync costs. But with an LSM you get automatic spilling to SSTs if the journal gets too large, and the WAL is checksummed by default to prevent recovering corrupted data. |
flushing to sst is unnecessary. this is the point. More or less, there is an extra disk write burden. I still want such an example to encourage user application to use most appropriate pure WAL-like storage other than a LSM based storage. But you said that the impact is negligible, so it is okay. |
Add a fjall based kv example. fjall is a lsm kv store that is written in pure Rust. Compared to RocksDB, it can accelerate compilation time.
fjall lack of
remove_rangebut it will impl soon I think.Checklist
This change is