Docs/add deweyid and index types to architecture #818

JohannesLichtenberger · 2026-01-07T12:31:44Z

No description provided.

- Add DeweyID Index to the Primary Indexes diagram - Document DeweyID storage (inline in KeyValueLeafPages) and benefits - Add comprehensive Secondary Index Types section with: - Path Index: PCR → NodeKeys mapping, use cases - Name Index: QNm hash → NodeKeys mapping, use cases - CAS Index: Value+Path → NodeKeys mapping, range query support - Include visual examples for each index type

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

- Add DeweyID Index to the Primary Indexes diagram - Document DeweyID storage (inline in KeyValueLeafPages) and benefits - Add comprehensive Secondary Index Types section (Path, Name, CAS) - Fix 'impossible trilemma' to 'conflicting goals' (more accurate) - Clarify surgical updates: depends on versioning type + rolling hash - Update Node Store diagram to show actual SirixDB encoding: parentKey, firstChildKey, lastChildKey, leftSiblingKey, rightSiblingKey

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Updated the comparison between Document Store and Node Store to enhance clarity and detail.

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

HOT stands for Height-Optimized Trie, so HOT_TRIE was redundant. Also includes architecture documentation improvements: - Add DeweyIDs and secondary index types documentation - Fix various accuracy issues in diagrams and examples - Update default SLIDING_SNAPSHOT window to 4 - Add PostOrderAxis and LevelOrderAxis to spatial axes - Add between-timestamps transaction example

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

- Add PageHasher utility class for fast XXH3 hashing (~15 GB/s) with backward compatibility for SHA-256 hashes from legacy databases - Add SirixCorruptionException for detailed corruption error reporting - Add verifyChecksumsOnRead configuration option (default: false) - Update all writers (FileChannel, File, IOUring, MMFile) to use XXH3 - Update all readers to verify checksums when enabled: - Non-KVLP pages: verify on compressed bytes before decompression - KVLP pages: verify on uncompressed bytes after decompression - Ensure page fragment hashes are propagated for verification - Add comprehensive unit tests for PageHasher, SirixCorruptionException, and configuration Hash algorithm is auto-detected by length (8 bytes = XXH3, 32 bytes = SHA-256) for seamless backward compatibility with existing databases.

…shing High-performance optimizations aligned with financial/HFT system best practices: - HashAlgorithm enum now uses direct bit manipulation for longToBytes/bytesToLong instead of ByteBuffer allocations (eliminates heap allocation in hot paths) - Added zero-allocation long-based API (computeHashLong, verifyLong) as primary interface for verification hot paths - PageHasher now provides both: - Default XXH3 convenience methods (compute(byte[]), computeLong(byte[])) - Explicit algorithm methods for extensibility - ResourceConfiguration now includes hashAlgorithm field (defaults to XXH3) for future algorithm extensibility - All writers/readers updated to use the new API - Added HASH_LENGTH and DEFAULT_ALGORITHM constants for HFT-style clarity Zero-copy design preserved: native MemorySegments still use direct address hashing. Verification hot path uses primitive long comparison instead of Arrays.equals().

The checksum verification was failing because: - KVLP pages computed hash on UNCOMPRESSED data - Non-KVLP pages computed hash on COMPRESSED data - Verification tried to detect KVLP from first byte of COMPRESSED data, which doesn't work since LZ4 compressed data doesn't preserve the page type Fix: All page types now consistently hash COMPRESSED data. This: - Simplifies the verification logic (no KVLP special cases) - Avoids the impossible task of detecting page type from compressed bytes - Provides consistent behavior across all storage backends Removed: - KVLP-specific hash computation in PageKind.serializePage - KVLP-specific verification methods in AbstractReader and FileReader - KVLP detection from compressed data in verifyChecksumIfNeeded

These files referenced non-existent io.sirix.io.RevisionIndex class and used JMH annotations in the wrong source set (main instead of jmh).

Johannes Lichtenberger and others added 30 commits January 7, 2026 02:57

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

4c3898a

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Update terminology from 'document store' to 'node store'

18bd0ce

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

585c999

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

2859c4d

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

1aed63e

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Revise Document Store vs. Node Store section

cbce690

Updated the comparison between Document Store and Node Store to enhance clarity and detail.

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

e5293d9

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

9bccdf8

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

73035f3

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

9ea8de8

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

4d10cb3

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

84a25c2

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

8e92d7d

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

13e672b

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

bd9f378

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

cdb2e5f

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

af44e86

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

35be5a1

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

43110a7

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

d9aee6a

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

92550ff

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

d6d7399

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

90d0b0d

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

0cf4f9e

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

5565a26

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Merge remote-tracking branch 'origin/docs/add-deweyid-and-index-types…

6ca13b4

…-to-architecture' into docs/add-deweyid-and-index-types-to-architecture

Johannes Lichtenberger added 5 commits January 7, 2026 13:12

fix: Remove unused PageHasher import from PageKind

eac57c2

fix: Remove broken RevisionIndex benchmark files

ed6964f

These files referenced non-existent io.sirix.io.RevisionIndex class and used JMH annotations in the wrong source set (main instead of jmh).

JohannesLichtenberger merged commit 3e5e111 into main Jan 7, 2026
3 checks passed

JohannesLichtenberger deleted the docs/add-deweyid-and-index-types-to-architecture branch January 7, 2026 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Docs/add deweyid and index types to architecture #818

Docs/add deweyid and index types to architecture #818

Uh oh!

JohannesLichtenberger commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Docs/add deweyid and index types to architecture #818

Docs/add deweyid and index types to architecture #818

Uh oh!

Conversation

JohannesLichtenberger commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants