-
-
Notifications
You must be signed in to change notification settings - Fork 248
Docs/add deweyid and index types to architecture #818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
JohannesLichtenberger
merged 35 commits into
main
from
docs/add-deweyid-and-index-types-to-architecture
Jan 7, 2026
Merged
Docs/add deweyid and index types to architecture #818
JohannesLichtenberger
merged 35 commits into
main
from
docs/add-deweyid-and-index-types-to-architecture
Jan 7, 2026
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add DeweyID Index to the Primary Indexes diagram - Document DeweyID storage (inline in KeyValueLeafPages) and benefits - Add comprehensive Secondary Index Types section with: - Path Index: PCR → NodeKeys mapping, use cases - Name Index: QNm hash → NodeKeys mapping, use cases - CAS Index: Value+Path → NodeKeys mapping, range query support - Include visual examples for each index type
- Add DeweyID Index to the Primary Indexes diagram - Document DeweyID storage (inline in KeyValueLeafPages) and benefits - Add comprehensive Secondary Index Types section with: - Path Index: PCR → NodeKeys mapping, use cases - Name Index: QNm hash → NodeKeys mapping, use cases - CAS Index: Value+Path → NodeKeys mapping, range query support - Include visual examples for each index type
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
- Add DeweyID Index to the Primary Indexes diagram - Document DeweyID storage (inline in KeyValueLeafPages) and benefits - Add comprehensive Secondary Index Types section (Path, Name, CAS) - Fix 'impossible trilemma' to 'conflicting goals' (more accurate) - Clarify surgical updates: depends on versioning type + rolling hash - Update Node Store diagram to show actual SirixDB encoding: parentKey, firstChildKey, lastChildKey, leftSiblingKey, rightSiblingKey
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
Updated the comparison between Document Store and Node Store to enhance clarity and detail.
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
HOT stands for Height-Optimized Trie, so HOT_TRIE was redundant. Also includes architecture documentation improvements: - Add DeweyIDs and secondary index types documentation - Fix various accuracy issues in diagrams and examples - Update default SLIDING_SNAPSHOT window to 4 - Add PostOrderAxis and LevelOrderAxis to spatial axes - Add between-timestamps transaction example
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
- Add PageHasher utility class for fast XXH3 hashing (~15 GB/s) with backward compatibility for SHA-256 hashes from legacy databases - Add SirixCorruptionException for detailed corruption error reporting - Add verifyChecksumsOnRead configuration option (default: false) - Update all writers (FileChannel, File, IOUring, MMFile) to use XXH3 - Update all readers to verify checksums when enabled: - Non-KVLP pages: verify on compressed bytes before decompression - KVLP pages: verify on uncompressed bytes after decompression - Ensure page fragment hashes are propagated for verification - Add comprehensive unit tests for PageHasher, SirixCorruptionException, and configuration Hash algorithm is auto-detected by length (8 bytes = XXH3, 32 bytes = SHA-256) for seamless backward compatibility with existing databases.
…shing High-performance optimizations aligned with financial/HFT system best practices: - HashAlgorithm enum now uses direct bit manipulation for longToBytes/bytesToLong instead of ByteBuffer allocations (eliminates heap allocation in hot paths) - Added zero-allocation long-based API (computeHashLong, verifyLong) as primary interface for verification hot paths - PageHasher now provides both: - Default XXH3 convenience methods (compute(byte[]), computeLong(byte[])) - Explicit algorithm methods for extensibility - ResourceConfiguration now includes hashAlgorithm field (defaults to XXH3) for future algorithm extensibility - All writers/readers updated to use the new API - Added HASH_LENGTH and DEFAULT_ALGORITHM constants for HFT-style clarity Zero-copy design preserved: native MemorySegments still use direct address hashing. Verification hot path uses primitive long comparison instead of Arrays.equals().
The checksum verification was failing because: - KVLP pages computed hash on UNCOMPRESSED data - Non-KVLP pages computed hash on COMPRESSED data - Verification tried to detect KVLP from first byte of COMPRESSED data, which doesn't work since LZ4 compressed data doesn't preserve the page type Fix: All page types now consistently hash COMPRESSED data. This: - Simplifies the verification logic (no KVLP special cases) - Avoids the impossible task of detecting page type from compressed bytes - Provides consistent behavior across all storage backends Removed: - KVLP-specific hash computation in PageKind.serializePage - KVLP-specific verification methods in AbstractReader and FileReader - KVLP detection from compressed data in verifyChecksumIfNeeded
These files referenced non-existent io.sirix.io.RevisionIndex class and used JMH annotations in the wrong source set (main instead of jmh).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.