Version: 1.0.0
Date: February 9, 2026
Category: 💾 Storage
This document describes the physical storage layout and key organization in ThemisDB's RocksDB-based storage engine.
ThemisDB uses a hierarchical prefix scheme to logically separate different data types within RocksDB. This enables efficient range scans and compaction strategies.
Primary entity data storage:
entity:<table>:<pk> → <BaseEntity serialized blob>
Example:
entity:users:12345 → {id: 12345, name: "Alice", email: "alice@example.com"}
Standard secondary indexes for efficient lookups:
idx:<table>:<column>:<value>:<pk> → (empty or minimal metadata)
Example:
idx:users:email:alice@example.com:12345 → ""
Sorted indexes for range queries:
ridx:<table>:<column>:<value>:<pk> → (empty or minimal metadata)
Example:
ridx:orders:created_at:20260209:1001 → ""
Indexes for fields with many null values:
sidx:<table>:<column>:<value>:<pk> → (empty)
Time-to-live indexes for automatic expiration:
ttlidx:<table>:<expiry_timestamp>:<pk> → (empty)
Inverted indexes for full-text search:
ftidx:<table>:<column>:<term>:<pk> → (relevance score)
Bi-directional edge storage for graph queries:
Outgoing edges:
graph:out:<from_pk>:<edge_id> → <to_pk>
Incoming edges:
graph:in:<to_pk>:<edge_id> → <from_pk>
Example:
graph:out:user123:follows456 → user789
graph:in:user789:follows456 → user123
Metadata for vector embeddings (actual vectors stored in specialized index structures):
vector:<table>:<pk> → {dimension, norm, metadata}
Event log for change data capture:
changefeed:<sequence> → {operation, table, key, old_value, new_value, timestamp}
Optimized storage for time-series metrics:
ts:<metric>:<timestamp>:<tags> → <value(s)>
Example:
ts:cpu_usage:1707465600:host=server1 → 45.2
ThemisDB uses RocksDB column families to physically separate different data types, enabling independent tuning and compaction strategies.
- Used for entity storage and general-purpose data
- Default configuration suitable for mixed workloads
For large workloads, data can be separated into dedicated column families:
cf_entities- Primary entity storagecf_indexes- Secondary, range, sparse, and full-text indexescf_graph- Graph adjacency listscf_changefeed- Change data capture eventscf_ts- Time-series datacf_vector- Vector index metadata
Benefits:
- Independent LSM compaction per data type
- Optimized block cache allocation
- Separate bloom filter tuning
- Isolated backup/restore operations
The WAL ensures durability and crash recovery:
- Sequential append-only log file
- Each write batch is logged atomically
- Automatically synced based on durability settings
WriteOptions write_options;
write_options.sync = true; // fsync for maximum durability
write_options.disableWAL = false; // enable WAL- Automatic pruning: WAL files are deleted after data is flushed to SST files
- Manual checkpointing: Force WAL rotation and cleanup
- Size limits:
max_total_wal_sizeprevents unbounded growth
RocksDB snapshots provide consistent point-in-time views:
auto snapshot = db->GetSnapshot();
ReadOptions read_opts;
read_opts.snapshot = snapshot;
// Read operations see data as of snapshot creation time
auto result = db->Get(read_opts, key, &value);
db->ReleaseSnapshot(snapshot);- Each version includes a sequence number
- Readers use snapshots to see consistent views
- Writers create new versions without blocking readers
- Old versions are cleaned up during compaction
- LSM-tree structure with multiple levels
- Level 0: Unsorted, newly flushed files
- Level 1+: Sorted, non-overlapping files
- Optimized for write-heavy workloads
- Alternative strategy for read-heavy workloads
- All files maintained at same level
- Periodic full compaction
- For time-series data with automatic expiration
- Old files deleted based on time or size
- Minimal overhead
- Compaction stats: Throughput, amplification
- Block cache hit rate: Read performance indicator
- Write amplification: Write efficiency measure
- Stall events: Resource contention warnings
See also: RocksDB Optimization Guide
Common optimizations:
- Increase block cache size for read-heavy workloads
- Tune bloom filter bits for point lookups
- Adjust compaction threads for I/O saturation
- Configure rate limiting for background operations
- RocksDB Wrapper Documentation
- RocksDB Storage Operations (DE)
- KeySchema Documentation
- Storage Module Overview
Version: 1.0.0 | License: MIT | Support: GitHub Issues