A version-aware, incrementally maintained vector database that treats knowledge as a continuously evolving graph rather than a static collection of embeddings.
The aetherv Python package is the first working slice of the engine: a segmented, on-disk vector database with GPU-accelerated similarity search. It is installable from the repo root via pyproject.toml (Hatchling build, Python ≥ 3.11).
| Library | Role |
|---|---|
| fastembed | Default text embedder (TextEmbedding); vectors are L2-normalized after encoding |
| jax | JIT-compiled dot-product search on GPU (or CPU fallback) per segment |
| numpy | Array interchange between embedder, storage, and search |
| polars | Columnar metadata store (id, text, segment, row) backed by Parquet |
| pyarrow | Arrow IPC read/write for fixed-size embedding vectors on disk |
Dev extras: pytest.
aetherv/
├── __init__.py # Public API: Config, VectorDB, SearchResult, SegmentRecord
├── config.py # Runtime paths and segment sizing
├── db.py # VectorDB — insert, query, segment lifecycle
├── embedder.py # Embedder protocol + FastEmbedder
├── segments.py # Segment file path resolution
├── types.py # SearchResult, SegmentRecord dataclasses
├── search/
│ └── gpu.py # SegmentSearcher — JAX @jit score + top-k
└── storage/
├── arrow.py # ArrowSegment — IPC write/read for embedding matrices
├── manifest.py # JSON manifest of segment records
└── metadata.py # Polars Parquet store with O(1) (segment, row) lookup
Opening a VectorDB at root (default vectordb/) produces:
vectordb/
├── metadata.parquet # id, text, segment, row — Polars
├── manifest.json # segment index (id, name, vector_count, created_at)
└── segments/
├── segment_000000.arrow
├── segment_000001.arrow
└── ...
Each .arrow file is an Arrow IPC stream of FixedSizeList<float32> embeddings for one segment (default up to 10,000 vectors per segment, configurable via Config.segment_size).
Insert — VectorDB.insert(ids, texts) embeds all texts through the configured embedder (default FastEmbedder), batches vectors into segments, writes each batch as an Arrow IPC file, appends rows to the Polars metadata table, and registers a SegmentSearcher for the new segment.
Query — VectorDB.query(text, k) embeds the query, runs parallel top-k search across every loaded segment (ThreadPoolExecutor), merges candidates by score, and resolves (segment_id, row) pairs back to (id, text) via the metadata lookup.
Search kernel — Because embeddings are normalized, a dot product equals cosine similarity. SegmentSearcher uploads a segment matrix to the JAX device once, then uses a @jax.jit matvec for scoring and numpy.argpartition for top-k within each segment.
Pluggability — Embedder is a Protocol; tests inject a deterministic hash-based embedder to avoid model downloads. Config controls root path, segment size, and filenames.
pip install -e ".[dev]" # from repo root
pytestfrom aetherv import Config, VectorDB
db = VectorDB("vectordb", config=Config(segment_size=10_000))
db.insert(
ids=[1, 2, 3],
texts=["JAX accelerates search", "Polars stores metadata", "Arrow holds vectors"],
)
for hit in db.query("dataframe library", k=2):
print(hit.score, hit.id, hit.text)tests/test_vectordb.py covers insert/query, O(1) metadata lookup, legacy manifest loading, and empty inserts using a DeterministicEmbedder (no network or model download).
Build the first retrieval-aware, version-aware, incrementally maintained vector database capable of operating on continuously changing knowledge without full reindexing.
Current vector databases optimize:
- Similarity search
- ANN indexing
- Storage efficiency
AetherV Evolution Engine optimizes:
- Knowledge freshness
- Incremental updates
- Retrieval correctness under change
- Autonomous index maintenance
The future bottleneck of RAG is not retrieval speed.
The bottleneck is:
"How can a retrieval system stay correct while its knowledge continuously changes?"
Current systems:
Document changes → Rechunk → Re-embed → Reindex
AetherV:
Document changes → Semantic diff → Impact prediction → Localized updates → Retrieval remains correct
Category:
Version-Aware Dynamic Retrieval Engine
Tagline:
Git for Knowledge + Vector Database
Primary Users:
- Enterprise RAG
- Documentation systems
- Agentic systems
- Knowledge management platforms
- Real-time data platforms
Determine what actually changed.
Input:
Document v1 Document v2
Output:
- Added concepts
- Modified concepts
- Deleted concepts
- Dependency impact
Goal:
Avoid unnecessary embedding generation.
Novel research component.
Question:
Will this change affect retrieval?
Example:
"128GB RAM" → "129GB RAM"
Embedding changes.
Retrieval behavior likely does not.
Decision:
Skip expensive update.
Expected savings:
70-95% embedding reduction.
Every chunk becomes temporal.
Chunk schema:
{ chunk_id, version_id, valid_from, valid_to, parent_version }
Supports:
- Historical retrieval
- Change tracking
- Temporal QA
Knowledge becomes a graph.
Chunk → Summary → RAPTOR node → KG entity → Agent memory
If node changes:
Automatically identify stale descendants.
Inspired by LSM-VEC.
Structure:
L0 = recent updates
L1 = warm data
L2 = stable data
L3 = archive
Advantages:
- Fast inserts
- Fast deletes
- No global rebuilds
Goal 1
Reduce embedding regeneration by 90%.
Goal 2
Reduce index rebuild operations to near zero.
Goal 3
Maintain retrieval accuracy >99% of full reindex baseline.
Goal 4
Support continuous ingestion at enterprise scale.
Goal 5
Achieve sub-second update propagation.
Layer 0 Source Connectors
Layer 1 CDC Engine
Layer 2 Semantic Diff Engine
Layer 3 Retrieval Impact Predictor
Layer 4 Embedding Manager
Layer 5 Version Store
Layer 6 Dependency Graph
Layer 7 LSM Vector Index
Layer 8 Retrieval API
WP-1 Foundation
Duration: 2 weeks
Deliverables:
- Monorepo
- CI/CD
- Benchmark framework
- Dataset registry
Success Criteria:
Repeatable experiments.
WP-2 Change Detection Engine
Duration: 3 weeks
Tasks:
- File CDC
- Database CDC
- Event ingestion
- Hash-based diffing
Output:
Changed chunk list
Success Criteria:
Detect changes with >99.9% precision.
WP-3 Semantic Diff Engine
Duration: 4 weeks
Tasks:
- AST extraction
- Chunk fingerprinting
- Concept extraction
- Semantic similarity graph
Output:
Semantic delta object
Success Criteria:
Correctly classify additions, deletions, modifications.
WP-4 Retrieval Impact Predictor
Duration: 6 weeks
Research Track
Tasks:
- Build retrieval benchmark
- Learn retrieval sensitivity
- Predict update necessity
Output:
Impact score
0.0 → no update
1.0 → must update
Success Criteria:
Skip >70% updates while preserving retrieval quality.
WP-5 Versioned Storage
Duration: 3 weeks
Tasks:
- Chunk versioning
- Temporal metadata
- Lineage tracking
Success Criteria:
Historical reconstruction support.
WP-6 Dependency Graph
Duration: 4 weeks
Tasks:
- Graph schema
- Edge inference
- Incremental propagation
Success Criteria:
Detect all downstream stale nodes.
WP-7 Incremental Embedding Engine
Duration: 6 weeks
Tasks:
- Selective re-embedding
- Delta embedding experiments
- Embedding cache
Success Criteria:
90% reduction in embedding workload.
WP-8 LSM Vector Index
Duration: 8 weeks
Tasks:
- L0-L3 architecture
- Incremental HNSW
- Compaction engine
Success Criteria:
No full index rebuilds.
WP-9 Retrieval Layer
Duration: 4 weeks
Tasks:
- Hybrid search
- Temporal search
- Version-aware ranking
Success Criteria:
Beat baseline RAG retrieval.
WP-10 Research Publication
Duration: Ongoing
Targets:
- arXiv
- VLDB
- SIGIR
- NeurIPS Datasets & Benchmarks
Potential Paper Titles:
Retrieval-Aware Incremental Embedding
Version-Aware Dynamic Vector Retrieval
AetherV: A Knowledge Evolution Engine for Continually Updated RAG Systems
Embedding Cost Reduction: Target >90%
Update Latency: Target <1 second
Index Rebuild Frequency: Target zero
Retrieval Accuracy Loss: Target <1%
Storage Overhead: Target <20%
MVP includes:
✓ CDC
✓ Semantic diffing
✓ Version tracking
✓ Selective re-embedding
✓ Incremental HNSW
✓ Evaluation suite
MVP excludes:
✗ Multi-node clustering
✗ GPU acceleration
✗ Agent orchestration
✗ RAPTOR integration
✗ Knowledge graph generation
These become Phase 2.
- Multi-node distributed engine
- GPU kernels
- RAPTOR hierarchy
- GraphRAG integration
- Agent memory support
- Real-time streaming ingestion
- Learned ANN routing
This document establishes the scope, architectural design, and implementation roadmap for building a next-generation, heterogeneous vector database from scratch. Project AetherV shifts away from monolithic database paradigms by separating dynamic control logic from parallelized mathematical acceleration.
Current vector databases face a sharp trade-off: they are either optimized for fast, static similarity searches on hardware accelerators (like GPU/TPU indices) or built for dynamic text processing, metadata filtering, and graph routing on standard CPUs. When a production multi-agent system runs complex pipelines—such as RAPTOR tree traversals, hybrid lexical/dense search, and real-time self-correcting routing loops—monolithic databases create severe latency overhead. They waste valuable GPU compute on sequential pointer-chasing logic or bottleneck the CPU with massive array operations.
- The Inter-Node Latency Tax: Eliminates the serialization and network overhead of bouncing data between standalone graph stores, BM25 engines, and vector indices.
- The Abstraction Tax: Prevents framework state bloat (inherent in heavy orchestrators) by managing the agentic state machine directly inside the database control layer.
- The Accelerator Compiling Problem: Solves JAX’s rigid requirement for static array shapes during execution by implementing a zero-copy memory bridge over fixed-capacity pre-allocated CPU layouts.
AetherV splits the database into two highly specialized planes operating over shared memory:
[ Ingestion / Query Client ]
│
▼
┌────────────────────────────────────────────────────────┐
│ CPU CONTROL PLANE (Polars) │
│ - HNSW Graph Traversal - BM25 Token Dictionaries │
│ - RAPTOR Tree Clustering - Deterministic Statechart │
└────────────────────────────┬───────────────────────────┘
│ Zero-Copy Apache Arrow Bridge
▼
┌────────────────────────────────────────────────────────┐
│ GPU EXECUTION PLANE (JAX) │
│ - Fused Cosine Similarity - Batched Cross-Encoder Rerank│
│ - Conformal Masking - Vectorized Index Scans │
└────────────────────────────────────────────────────────┘
To achieve maximum throughout and mechanical sympathy with the underlying hardware, the core stack is strictly constrained to high-performance, non-bloated libraries:
- Control Plane & Memory Layout: Polars / Apache Arrow. Polars provides fast columnar structures on the CPU, and Apache Arrow ensures that data can be exposed to accelerators via raw memory pointers without slow serialization passes.
- Execution Plane: JAX (XLA). JAX compiles distance functions, cross-encoder scoring, and conformal mask thresholds into fused, branch-free GPU kernels.
- Concurrency Model: **Python
asyncio**. Manages concurrent query streams and asynchronous actor worker tasks without thread blocking.
The main collection matrix is tracked in a centralized, contiguous memory table. Every document chunk maps directly to a fixed implicit integer row index (
| Column Name | Type | Purpose / Description |
|---|---|---|
id |
pl.UInt32 |
Global unique identifier / contiguous matrix offset |
parent_id |
pl.Int32 |
RAPTOR tree pointer (Points to parent summary ID; -1 if root) |
layer_level |
pl.UInt8 |
RAPTOR level ( |
hnsw_edges |
pl.List(pl.Int32) |
Padded fixed-length array of neighbor node indices |
sparse_tokens |
pl.List(pl.UInt32) |
Hashed vocabulary bin tokens for BM25 matching |
sparse_weights |
pl.List(pl.Float32) |
Corresponding pre-computed term frequencies |
doc_string |
pl.String |
Raw text block (Retained strictly for synthesis output) |
The execution plane registers a static block of memory on device initialization to prevent costly re-allocation delays during ingestion.
DENSE_MATRIX:jax.Arrayof shape[MAX_CAPACITY, FEATURE_DIM](float32)SPARSE_MATRIX:jax.Arrayof shape[MAX_CAPACITY, MAX_TOKEN_PADDINGS](float32)
- Goal: Establish zero-overhead data handoffs between CPU records and compiled GPU math.
- Deliverables:
- Configure a fixed-capacity Apache Arrow allocation layer holding raw vector representations.
- Implement the core JAX execution kernel for batched Cosine Similarity using
jax.jitandjax.vmap. - Build the interface layer that reads memory pointers from a Polars chunk partition and maps them into JAX device arrays without copying data strings.
- Goal: Combine exact dense vector lookups with sparse BM25 mechanics on the accelerator.
- Deliverables:
- Create a tokenization helper that hashes text blocks into a uniform, padded sparse array layout.
- Write the fused JAX mathematical kernel that simultaneously computes dense cosine distance and sparse dot-product scores.
- Implement a Reciprocal Rank Fusion (RRF) step directly inside the XLA compilation block to return unified top-K index positions.
- Goal: Add support for structural document grouping (RAPTOR) and statistical compression.
- Deliverables:
- Build the offline ingestion pipeline that clusters base text embeddings via a lightweight Gaussian Mixture Model (GMM) and inserts parent summaries back into the Polars table layout.
- Write the CPU traversal router to fetch parent summary references when deep context is requested.
- Implement a JAX-backed Conformal Filtering layer to dynamically threshold and mask out noise based on the scoring variance across retrieved arrays.
- Goal: Embed the multi-agent routing loop directly into the database control plane.
- Deliverables:
- Implement an asynchronous
StatechartSupervisorthat transitions search workflows based on strict routing matrices. - Build independent actor workers (
VectorSearchActor,CriticActor,SynthesizerActor) operating over non-blocking message loops. - Conduct end-to-end performance tracing to measure query execution times under parallel load constraints.
To transition the project from alpha implementation to a production-ready target, the engine must satisfy three objective performance baselines:
- Zero Dynamic Re-compilation: After the initial database setup pass, adding new records or executing search pipelines must result in zero XLA re-compilation events.
-
Deterministic Graph Routing: The statechart control plane must demonstrate a
$100%$ validation rate against invalid state transition attempts during loop cycles. - Throughput Scaling: The combined engine must achieve lower latency scales compared to a traditional monolithic database paired with a separate client-side agent framework when running recursive or self-correcting RAG workflows.
Which specific phase of the roadmap should be detailed first to begin drafting the underlying codebase?
Transform vector databases from static embedding stores into continuously evolving knowledge systems.
The core artifact is not the vector index.
The core artifact is a continuously maintained knowledge evolution graph.