A methodology for building AI-native knowledge bases that grow with your system's cognitive ability. Based on Karpathy's LLM Wiki methodology, significantly extended.
A complete framework for building a knowledge base that serves as an AI agent's cognitive substrate — not just a storage layer, but a living neural network of causally-linked knowledge that grows denser over time.
Most AI agent knowledge systems are glorified RAG pipelines: store → retrieve → inject into prompt. This framework argues that's fundamentally limited. Instead, knowledge should be compiled, causally linked, and continuously enriched — forming a "knowledge nebula" whose density directly determines the system's reasoning depth.
- Causal chains over flat links — Every knowledge node carries upstream causes, downstream effects, and feedback loops
- Proximity facets (知识棱镜) — 6 dimensions of context around every topic (technical, historical, opposing views, applications, limitations, cross-domain)
- 4-layer architecture — Schema → Raw → Wiki → Ops, with clear rules for each layer
- Knowledge Lint — Automated health checks for causal chain integrity, blind spot detection, and nebula density metrics
- Continuous compilation — Knowledge is never just stored; it's compiled into the existing structure on ingestion
| File | Description |
|---|---|
CONSTITUTION.md |
The core methodology: 4-layer architecture, topic/source/synthesis templates, core operations (Ingest/Query/Lint) |
KNOWLEDGE-LINT-SPEC.md |
8-category automated health check specification for knowledge graphs |
CONTEXT-COMPRESSOR-DESIGN.md |
Production-grade 5-stage context compression algorithm for long-running AI agent conversations |
- Read CONSTITUTION.md to understand the 4-layer architecture and core principles
- Copy the templates from the constitution (topic, source, synthesis) into your knowledge base
- Set up Knowledge Lint to automate health checks
- Integrate the context compressor if your agent runs long sessions
your-knowledge-base/
├── 00-schema/ Constitution + changelog (the "law")
├── 01-raw/ Raw source material (append-only, never modify)
├── 02-wiki/ Compiled knowledge (the "brain")
│ ├── sources/ Each raw item compiled into a structured source page
│ ├── topics/ Knowledge hub nodes with causal chains + proximity facets
│ └── synthesis/ Cross-topic causal overviews
└── 03-ops/ Automation scripts + lint reports
"Knowledge nebulae are the natural growth medium for AGI." We are not planting a tree — we are planting a seed that grows itself.
The core insight: a knowledge base's value is not in how much it stores, but in the density and depth of causal connections between stored items. A sparse knowledge base gives sparse reasoning. A dense one enables the system to "walk further along the knowledge network, see more dimensions, and discover correlations humans haven't noticed."
Extracted from a real multi-AI-agent production system running 3 agents (a primary dispatcher, a deep-reasoning agent, and a system-maintenance agent) on Windows Server. The framework evolved through 3 iterations (v1.0 → v3.0) over 2 months of daily operation, driven by practical needs and hard-won lessons.
MIT — Free to use, modify, and share.