ThemisDB Architecture Overview

Version: v1.3.0
Date: December 2025
Type: Technical Architecture Documentation
Kategorie: 🧩 Architecture

📑 Table of Contents

📋 Executive Summary
🏗️ System Architecture
🔑 Core Components

📋 Executive Summary

ThemisDB is a multi-model database system built on an LSM-tree foundation (RocksDB), providing unified access to relational, graph, vector, time-series, and document data models. The architecture emphasizes horizontal scalability, high performance through GPU acceleration, and enterprise-grade security.

Key Architectural Characteristics

Multi-Model Unified Storage: Single LSM-tree storage layer with specialized indexes
Horizontal Scalability: VCC-URN based sharding with automatic rebalancing
High Performance: GPU acceleration (10 backends), SIMD optimizations, query optimization
Enterprise Security: Field-level encryption, HSM/PKI integration, RBAC, Apache Ranger
ACID Transactions: MVCC with snapshot isolation, distributed SAGA patterns
Flexible Replication: Leader-Follower and Multi-Master (CRDT-based)

🏗️ System Architecture

Layered Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Client Layer                            │
│  REST API │ GraphQL │ gRPC │ Wire Protocol │ Native SDKs   │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                   API & Server Layer                        │
│  HTTP Server │ Authentication │ Rate Limiting │ Load Shed  │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                    Query Layer                              │
│  AQL Parser │ Query Optimizer │ Execution Engine           │
│  Function Libraries │ CTE Cache │ Semantic Cache           │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│              Transaction & Concurrency Layer                │
│  MVCC │ Transaction Manager │ SAGA Coordinator             │
│  Deadlock Detection │ WAL Management                       │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                   Index Layer                               │
│  Vector (HNSW) │ Graph │ Secondary │ Spatial │ Fulltext   │
│  GPU Acceleration │ SIMD Optimization                      │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                   Storage Layer                             │
│  RocksDB (LSM-tree) │ Key Schema │ Compression            │
│  WAL │ Snapshot Management │ Compaction                   │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│             Cross-Cutting Concerns                          │
│  Security │ Replication │ Sharding │ Monitoring │ CDC      │
└─────────────────────────────────────────────────────────────┘

🔑 Core Components

1. Storage Layer

Primary Technology: RocksDB (LSM-tree based key-value store)

Components:

RocksDB Wrapper (src/storage/rocksdb_wrapper.cpp)
- Configurable LSM-tree parameters
- Write-ahead logging (WAL)
- Snapshot isolation support
- Bloom filters for read optimization
- Block cache management
- Compaction strategies (level, universal)

Key Design Decisions:

LSM-tree chosen for write-heavy workloads
Hierarchical key schema enables multi-model support
Compression (ZSTD, LZ4) reduces storage footprint
Memory-mapped files for large datasets

Performance Characteristics:

Write throughput: ~100K ops/sec (single node)
Read latency: <1ms (cached), <10ms (disk)
Compression ratio: 3-5x typical

2. Index Layer

Purpose: Specialized indexes for different data models and access patterns

Vector Index (HNSW)

Implementation: src/index/vector_index.cpp (57,750 LOC)

Architecture:

┌──────────────────────────────────────┐
│         HNSW Graph Structure         │
│                                      │
│  Layer 2: ○────○────○                │
│            │    │    │                │
│  Layer 1: ○────○────○────○           │
│           │ \  │  / │    │           │
│  Layer 0: ○─○─○─○─○─○─○─○─○          │
│         (Full connectivity)          │
└──────────────────────────────────────┘

Features:

Multi-layer hierarchical graph
Efficient k-NN search (O(log n) expected)
GPU acceleration (10 backends)
SIMD distance calculations
Incremental index building
Persistence to RocksDB

Distance Metrics:

Cosine similarity
Euclidean (L2)
Dot product
Manhattan (L1)
Hamming

Performance:

Index build: ~10K vectors/sec (CPU), ~100K vectors/sec (GPU)
Query latency: <1ms for 1M vectors
Recall@10: >95% at ef_search=100

Graph Index

Implementation: src/index/graph_index.cpp (65,471 LOC)

Storage Format:

Vertex Store:
  Key: vertex_id
  Value: {properties, edge_list}

Edge Store:
  Key: edge_id
  Value: {source, target, properties}

Index Store (for traversal):
  Key: vertex_id + edge_type
  Value: adjacent_vertices

Algorithms:

BFS, DFS for traversal
Dijkstra, A* for shortest path
PageRank for importance ranking
Louvain for community detection
Betweenness centrality

Optimizations:

Compressed adjacency lists
Edge property caching
Parallel graph algorithms
GPU-accelerated PageRank

Secondary Indexes

Implementation: src/index/secondary_index.cpp (114,074 LOC)

Index Types:

B-tree: Range queries, ordered scans
Hash: Exact match queries
Bitmap: Low-cardinality columns
Full-text: BM25 ranking with stemming

Features:

Composite indexes (multi-column)
Partial indexes (WHERE clause)
Expression indexes (computed columns)
Covering indexes (index-only scans)

3. Query Layer

Purpose: Parse, optimize, and execute queries

AQL (Advanced Query Language)

Query Processing Pipeline:

AQL Query String
      ↓
[Lexer/Parser] → AST
      ↓
[Type Checker] → Typed AST
      ↓
[Optimizer] → Optimized Plan
      ↓
[Code Generator] → Execution Plan
      ↓
[Execution Engine] → Results

Components:

Parser (aql_parser.cpp - 43,972 LOC)
- LL(k) grammar
- Error recovery
- AST generation
Translator (aql_translator.cpp - 70,388 LOC)
- AST transformation
- Type inference
- Name resolution
- Semantic validation
Optimizer (query_optimizer.cpp - 7,234 LOC)
- Cost-based optimization
- Rule-based transformations
- Cardinality estimation
- Join reordering
- Predicate pushdown
- Index selection
Execution Engine (query_engine.cpp - 47,658 LOC)
- Volcano-style iterator model
- Pipelined execution
- Parallel execution (thread pool)
- Memory management

Optimization Techniques:

Predicate pushdown to storage layer
Join reordering via dynamic programming
Constant folding
Common subexpression elimination
Index-only scans
Aggregate pushdown

Performance:

Simple queries: <1ms
Complex joins: <100ms (1M rows)
Aggregations: 10M rows/sec (single core)

4. Transaction Layer

Purpose: ACID guarantees with MVCC

MVCC Implementation

Version Chain Structure:

Latest Version ← Older ← Oldest
    (v3)         (v2)    (v1)

Each version contains:
- Transaction ID (begin_ts, end_ts)
- Data snapshot
- Pointer to previous version

Isolation Levels:

Read Uncommitted
Read Committed
Repeatable Read
Serializable (SSI - Serializable Snapshot Isolation)

Concurrency Control:

Optimistic concurrency control
Write-write conflict detection
Deadlock detection (wait-for graph)
Lock-free reads

Write-Ahead Logging (WAL):

ARIES protocol
Physiological logging
Checkpoint management
Log replay for recovery

Performance:

Transaction throughput: ~50K txn/sec (mixed workload)
Commit latency: <5ms (sync WAL)
Recovery time: <1 min for 1GB WAL

5. Sharding & Distribution

Purpose: Horizontal scalability

VCC-URN Sharding

Architecture:

Client Request
      ↓
[Shard Router] → Determines target shard(s)
      ↓
[Shard Manager] → Routes to appropriate shard
      ↓
[Shard Node 1] [Shard Node 2] ... [Shard Node N]

Sharding Strategy:

Hash-based sharding (consistent hashing)
Range-based sharding
VCC-URN hierarchical partitioning
Virtual nodes for load distribution

Components:

Shard Router (shard_router.cpp)
- Request routing
- Multi-shard aggregation
- Transaction coordination
Shard Manager (shard_manager.cpp)
- Shard allocation
- Metadata management
- Health monitoring
Auto Rebalancer (shard_rebalancer.cpp)
- Load detection
- Migration planning
- Safety mechanisms
- Zero-downtime migration

Rebalancing Algorithm:

1. Monitor shard metrics (load, storage, latency)
2. Detect imbalance (weighted scoring)
3. Generate migration plan
4. Execute migration (copy, sync, switch)
5. Verify and cleanup

Gossip Protocol

Purpose: Cluster membership and state sync

Protocol:

Every node periodically:
1. Select random peer
2. Exchange state information
3. Merge states (CRDTs)
4. Detect failures (timeout)

State Synchronized:

Cluster membership
Shard assignments
Configuration
Health status

Convergence Time: O(log n) rounds

6. Replication Layer

Purpose: Data redundancy and high availability

Replication Modes

1. Leader-Follower (Async)

Leader ──(WAL Stream)──> Follower 1
       └──(WAL Stream)──> Follower 2

2. Multi-Master (CRDT)

Node 1 ←──(State Sync)──→ Node 2
   ↕                         ↕
Node 3 ←──(State Sync)──→ Node 4

Conflict Resolution:

Last-Write-Wins (LWW) with HLC timestamps
CRDTs for commutative operations
Application-defined merge functions
Manual conflict resolution queue

Consistency Guarantees:

Leader-Follower: Eventual consistency, read-your-writes (from leader)
Multi-Master: Eventual consistency, causal consistency

Failover:

Automatic leader election (Raft-based)
Promotion of follower to leader
Split-brain prevention (quorum)

7. Security Layer

Purpose: Data protection and access control

Encryption

Field-Level Encryption:

Plaintext Data
      ↓
[Encryption Key] ← [Key Provider]
      ↓                (Vault/HSM/PKI)
AES-256-GCM
      ↓
Encrypted Data → Storage

Key Providers:

HashiCorp Vault (Transit engine)
HSM (PKCS#11)
PKI (Certificate-based)
Local (testing only)

Features:

Transparent encryption/decryption
Key rotation with lazy re-encryption
Per-field keys for granularity
Deterministic encryption for indexing

Access Control

RBAC (Role-Based Access Control):

User ──→ Roles ──→ Permissions ──→ Resources

Apache Ranger Integration:

Policy synchronization
Fine-grained permissions
Audit logging
Dynamic policy updates

Authentication:

JWT tokens (RS256, ES256)
Multi-tenancy via claims
Token expiry and refresh

8. Acceleration Layer

Purpose: GPU and CPU acceleration for vector operations

GPU Backend Architecture

┌──────────────────────────────────────┐
│      Backend Registry                │
│  (Selects optimal backend)           │
└──────────────────────────────────────┘
                 ↓
    ┌────────────┴────────────┐
    ↓            ↓            ↓
 [CUDA]      [Vulkan]      [OpenCL]
    ↓            ↓            ↓
 NVIDIA       Any GPU      Any GPU

Backend Selection:

Runtime detection of available GPUs
Capability matching (FP32, FP16, INT8)
Fallback chain: GPU → SIMD CPU → Scalar CPU

Operations Accelerated:

Vector distance calculations
Matrix multiplication
Index building (HNSW)
Aggregations
Sorting

Performance Gains:

Distance calculations: 10-50x speedup
HNSW index build: 10x speedup
Batch queries: 20-40x speedup

9. Analytics Layer

Purpose: OLAP and complex event processing

OLAP Engine

Columnar Storage:

Row Store:          Columnar Store:
┌──┬──┬──┬──┐      ┌────────┐
│id│a │b │c │      │1│2│3│4 │ (id)
├──┼──┼──┼──┤      ├────────┤
│1 │x │y │z │  →   │x│p│m│t │ (a)
├──┼──┼──┼──┤      ├────────┤
│2 │p │q │r │      │y│q│n│u │ (b)
└──┴──┴──┴──┘      └────────┘

Query Optimization:

Column pruning
Predicate pushdown
Late materialization
SIMD vectorized scans

Window Functions:

Partitioning
Ordering
Frame specification (ROWS, RANGE)
Aggregates (SUM, AVG, MIN, MAX)
Ranking (ROW_NUMBER, RANK, DENSE_RANK)
Offset (LAG, LEAD)

CEP Engine

Event Pattern Matching:

PATTERN: A → B → C WITHIN 10 seconds
WHERE A.value > 100 AND C.value < 50

Detection:
Events: A₁(t₀) → B₁(t₁) → C₁(t₂)
Match: ✓ (t₂ - t₀ < 10s, conditions met)

Features:

Sliding windows
Tumbling windows
Session windows
Pattern matching (regex-like)
Event correlation

10. Content Processing

Purpose: Extract metadata and content from various file formats

Processing Pipeline

File Upload
     ↓
[MIME Detection]
     ↓
[Format Router]
     ↓
┌────┴────┬────────┬────────┬────────┐
│  PDF    │ Office │ Image  │ Video  │
└────┬────┴────────┴────────┴────────┘
     ↓
[Content Extraction]
     ↓
[Metadata Extraction]
     ↓
[Indexing]
     ↓
Storage

Processors:

PDF: Text, tables, images
Office: Word, Excel, PowerPoint
Images: EXIF, OCR, thumbnails
Video/Audio: Metadata, thumbnails
CAD: 3D models, metadata
Geo: Shapefiles, GeoJSON

Plugin System:

Dynamic loading
Format extension
Custom processors

📊 Performance Characteristics

Throughput

Operation	Throughput	Notes
Writes	100K ops/sec	Single node, WAL enabled
Reads	500K ops/sec	Cached data
Vector Search	10K qps	1M vectors, k=10
Graph Traversal	1M edges/sec	BFS
Transactions	50K txn/sec	Mixed workload

Latency

Operation	Latency (p50)	Latency (p99)
Point Read	0.5ms	2ms
Point Write	1ms	5ms
Vector kNN	2ms	10ms
Simple Query	5ms	20ms
Complex Join	50ms	200ms

Scalability

Metric	Single Node	10 Nodes	100 Nodes
Write Throughput	100K/s	900K/s	8M/s
Data Size	1TB	10TB	100TB
Query Latency	10ms	12ms	15ms

🔒 Security Architecture

Defense in Depth

Layer 1: Network (TLS, mTLS)
Layer 2: Authentication (JWT, PKI)
Layer 3: Authorization (RBAC, Ranger)
Layer 4: Encryption (Field-level AES-256)
Layer 5: Audit (Comprehensive logging)

Compliance

GDPR: PII detection, pseudonymization, right to erasure
HIPAA: Encryption at rest/transit, audit logging
SOC 2: Access controls, monitoring, incident response
eIDAS: Qualified signatures via PKI

🔍 Monitoring & Observability

Metrics Collection

Prometheus Metrics:

Request rates, latencies, errors
Resource utilization (CPU, memory, disk)
Cache hit rates
Transaction throughput
Replication lag

OpenTelemetry Tracing:

Distributed traces across services
Query execution breakdown
Index operation timing
Network latency

Logging:

Structured JSON logging
Log levels: DEBUG, INFO, WARN, ERROR
Audit logs for security events
Query logs for performance analysis

🚀 Deployment Architecture

Single Node

┌─────────────────────────┐
│    ThemisDB Server      │
│  ┌─────────────────┐    │
│  │ All Components  │    │
│  └─────────────────┘    │
│         Storage         │
└─────────────────────────┘

Use Case: Development, small workloads

Clustered

┌──────────┐  ┌──────────┐  ┌──────────┐
│ Shard 1  │  │ Shard 2  │  │ Shard 3  │
│ (Leader) │  │ (Leader) │  │ (Leader) │
└────┬─────┘  └────┬─────┘  └────┬─────┘
     │             │             │
┌────┴─────┐  ┌────┴─────┐  ┌────┴─────┐
│Follower 1│  │Follower 2│  │Follower 3│
└──────────┘  └──────────┘  └──────────┘

Use Case: Production, high availability

Cloud Native

┌──────────────────────────────────┐
│      Load Balancer               │
└────────────┬─────────────────────┘
             ↓
┌────────────────────────────────┐
│   Kubernetes Cluster           │
│  ┌──────┐ ┌──────┐ ┌──────┐   │
│  │ Pod1 │ │ Pod2 │ │ Pod3 │   │
│  └──────┘ └──────┘ └──────┘   │
└────────────────────────────────┘
             ↓
┌────────────────────────────────┐
│   Persistent Storage (EBS/PD)  │
└────────────────────────────────┘

Use Case: Cloud deployments, auto-scaling

📚 Technology Stack

Core Dependencies

Component	Technology	Version
Storage	RocksDB	8.x
GPU Compute	CUDA	12.x
GPU Compute	Vulkan	1.3
Networking	Boost.Asio	1.82
JSON	nlohmann/json	3.11
HTTP	cpp-httplib	0.14
Compression	ZSTD	1.5
Logging	spdlog	1.12

Build System

CMake 3.20+
vcpkg for dependency management
C++20 standard
Compiler: GCC 11+, MSVC 2022+, Clang 14+

🎯 Design Principles

Performance First: Optimize hot paths, use zero-copy where possible
Horizontal Scalability: Design for distribution from day one
Operational Simplicity: Minimize manual intervention, automate operations
Security by Default: Encryption, authentication, authorization built-in
Extensibility: Plugin system for custom extensions
Observability: Comprehensive metrics and tracing
Fault Tolerance: Graceful degradation, automatic recovery

Document Version: 1.0.1
Last Updated: December 9, 2025
Maintained By: ThemisDB Architecture Team

FilesExpand file tree

ARCHITECTURE_OVERVIEW.md

Latest commit

History

ARCHITECTURE_OVERVIEW.md

File metadata and controls

ThemisDB Architecture Overview

📑 Table of Contents

📋 Executive Summary

Key Architectural Characteristics

🏗️ System Architecture

Layered Architecture

🔑 Core Components

1. Storage Layer

2. Index Layer

Vector Index (HNSW)

Graph Index

Secondary Indexes

3. Query Layer

AQL (Advanced Query Language)

4. Transaction Layer

MVCC Implementation

5. Sharding & Distribution

VCC-URN Sharding

Gossip Protocol

6. Replication Layer

Replication Modes

7. Security Layer

Encryption

Access Control

8. Acceleration Layer

GPU Backend Architecture

9. Analytics Layer

OLAP Engine

CEP Engine

10. Content Processing

Processing Pipeline

📊 Performance Characteristics

Throughput

Latency

Scalability

🔒 Security Architecture

Defense in Depth

Compliance

🔍 Monitoring & Observability

Metrics Collection

🚀 Deployment Architecture

Single Node

Clustered

Cloud Native

📚 Technology Stack

Core Dependencies

Build System

🎯 Design Principles