Skip to content

Conversation

@JohannesLichtenberger
Copy link
Member

No description provided.

Johannes Lichtenberger and others added 30 commits January 7, 2026 02:57
- Add DeweyID Index to the Primary Indexes diagram
- Document DeweyID storage (inline in KeyValueLeafPages) and benefits
- Add comprehensive Secondary Index Types section with:
  - Path Index: PCR → NodeKeys mapping, use cases
  - Name Index: QNm hash → NodeKeys mapping, use cases
  - CAS Index: Value+Path → NodeKeys mapping, range query support
- Include visual examples for each index type
- Add DeweyID Index to the Primary Indexes diagram
- Document DeweyID storage (inline in KeyValueLeafPages) and benefits
- Add comprehensive Secondary Index Types section with:
  - Path Index: PCR → NodeKeys mapping, use cases
  - Name Index: QNm hash → NodeKeys mapping, use cases
  - CAS Index: Value+Path → NodeKeys mapping, range query support
- Include visual examples for each index type
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
- Add DeweyID Index to the Primary Indexes diagram
- Document DeweyID storage (inline in KeyValueLeafPages) and benefits
- Add comprehensive Secondary Index Types section (Path, Name, CAS)
- Fix 'impossible trilemma' to 'conflicting goals' (more accurate)
- Clarify surgical updates: depends on versioning type + rolling hash
- Update Node Store diagram to show actual SirixDB encoding:
  parentKey, firstChildKey, lastChildKey, leftSiblingKey, rightSiblingKey
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
Updated the comparison between Document Store and Node Store to enhance clarity and detail.
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
HOT stands for Height-Optimized Trie, so HOT_TRIE was redundant.

Also includes architecture documentation improvements:
- Add DeweyIDs and secondary index types documentation
- Fix various accuracy issues in diagrams and examples
- Update default SLIDING_SNAPSHOT window to 4
- Add PostOrderAxis and LevelOrderAxis to spatial axes
- Add between-timestamps transaction example
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture
Johannes Lichtenberger added 5 commits January 7, 2026 13:12
- Add PageHasher utility class for fast XXH3 hashing (~15 GB/s) with
  backward compatibility for SHA-256 hashes from legacy databases
- Add SirixCorruptionException for detailed corruption error reporting
- Add verifyChecksumsOnRead configuration option (default: false)
- Update all writers (FileChannel, File, IOUring, MMFile) to use XXH3
- Update all readers to verify checksums when enabled:
  - Non-KVLP pages: verify on compressed bytes before decompression
  - KVLP pages: verify on uncompressed bytes after decompression
- Ensure page fragment hashes are propagated for verification
- Add comprehensive unit tests for PageHasher, SirixCorruptionException,
  and configuration

Hash algorithm is auto-detected by length (8 bytes = XXH3, 32 bytes = SHA-256)
for seamless backward compatibility with existing databases.
…shing

High-performance optimizations aligned with financial/HFT system best practices:

- HashAlgorithm enum now uses direct bit manipulation for longToBytes/bytesToLong
  instead of ByteBuffer allocations (eliminates heap allocation in hot paths)
- Added zero-allocation long-based API (computeHashLong, verifyLong) as primary
  interface for verification hot paths
- PageHasher now provides both:
  - Default XXH3 convenience methods (compute(byte[]), computeLong(byte[]))
  - Explicit algorithm methods for extensibility
- ResourceConfiguration now includes hashAlgorithm field (defaults to XXH3)
  for future algorithm extensibility
- All writers/readers updated to use the new API
- Added HASH_LENGTH and DEFAULT_ALGORITHM constants for HFT-style clarity

Zero-copy design preserved: native MemorySegments still use direct address hashing.
Verification hot path uses primitive long comparison instead of Arrays.equals().
The checksum verification was failing because:
- KVLP pages computed hash on UNCOMPRESSED data
- Non-KVLP pages computed hash on COMPRESSED data
- Verification tried to detect KVLP from first byte of COMPRESSED data,
  which doesn't work since LZ4 compressed data doesn't preserve the page type

Fix: All page types now consistently hash COMPRESSED data. This:
- Simplifies the verification logic (no KVLP special cases)
- Avoids the impossible task of detecting page type from compressed bytes
- Provides consistent behavior across all storage backends

Removed:
- KVLP-specific hash computation in PageKind.serializePage
- KVLP-specific verification methods in AbstractReader and FileReader
- KVLP detection from compressed data in verifyChecksumIfNeeded
These files referenced non-existent io.sirix.io.RevisionIndex class
and used JMH annotations in the wrong source set (main instead of jmh).
@JohannesLichtenberger JohannesLichtenberger merged commit 3e5e111 into main Jan 7, 2026
3 checks passed
@JohannesLichtenberger JohannesLichtenberger deleted the docs/add-deweyid-and-index-types-to-architecture branch January 7, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants