Intelligent memory preservation system with contextual embeddings, knowledge graphs, and task-context awareness.
Stop losing context when your conversations get compacted! This system automatically extracts, scores, and preserves important memories from your coding sessions, then intelligently injects the most relevant ones back using contextual embeddings, knowledge graph traversal, and task-context scoring.
- ๐ฏ "Where You Left Off": Shows last 5 actions, files, and status before compaction (the "50 First Dates" solution!)
- ๐ Contextual Embeddings: Prepends session/time/file context to embeddings for better retrieval
- โฐ Temporal Queries: Find "yesterday's work" or "last week's changes" naturally
- ๐ File-Context Queries: Search "auth.py changes" or "modifications to utils.py"
- ๐ Evaluation Framework: Precision, Recall, F1, MRR metrics for measuring quality
- โ Test Suite: 293 tests, 49% coverage across critical components
- ๐ฆ No Version Suffixes: Clean filenames (precompact.py, not precompact_v2.py)
- ๐ข Centralized Versions: All versions tracked in version.py
- ๐ Proper Semantic Versioning: Major.Minor.Patch format
- ๐ธ๏ธ Knowledge Graph: Automatically extracts entities (files, functions, bugs, features) and builds relationship graph
- ๐ฏ Task-Context Scoring: Boosts memories relevant to current work (1.5-3x importance)
- ๐ PageRank Centrality: Identifies most important entities across conversations
- ๐ Multi-Hop Traversal: Finds related memories through graph relationships (1-2 hops)
- ๐ Adaptive K Retrieval: Returns 0-20 memories dynamically based on quality (not fixed top-K!)
- ๐ Full Transcript Storage: No truncation! Complete intent/action/outcome preserved
- ๐ nomic-embed-text-v1.5: 768-dim embeddings with 8192 token context (16x better than old model!)
- ๐จ Hierarchical Display: Short summaries for injection, full transcripts via query tool
- โก 85% Relevance Improvement: Task-relevant queries: 72% vs 39% with old system
- ๐ง Smart Chunking: Automatically breaks conversations into Intent-Action-Outcome triplets
- โญ Importance Scoring: 10+ signals identify critical memories (decisions, fixes, learnings)
- ๐ฏ Vector Search: Fast semantic retrieval with HNSW indexing (ChromaDB)
- ๐ Auto-Pruning: Removes old/redundant memories based on age, similarity, and capacity
- ๐ Hierarchical Clustering: Organizes related memories into topical groups
Automatically extracts and indexes:
- ๐ป Code snippets with language tags
- ๐ File paths modified/created
- ๐๏ธ Architecture discussions and design decisions
- โ๏ธ Commands executed
- โ Error messages and troubleshooting
- ๐ง Tools used (Read, Write, Edit, Bash, etc.)
- 100% Local: Uses sentence-transformers for embeddings (nomic-embed-text-v1.5)
- No API calls: Smart rule-based chunking (no LLM needed)
- Offline-first: Works without internet connection
- Python 3.8+
- Claude Code CLI
- ~500MB disk space (for dependencies)
# Clone the repository
git clone https://github.com/rhowardstone/claude-code-memory-system.git
cd claude-code-memory-system
# Run installer (safely updates settings.json)
./install.shThe installer will:
- Copy hooks to
~/.claude/memory-hooks/ - Install Python dependencies (chromadb, sentence-transformers, etc.)
- Update
~/.claude/settings.jsonwith hook configuration - Create memory database directory
If you prefer manual setup, see docs/INSTALLATION.md.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Normal Coding Session โ
โ You work with Claude Code, creating files, fixing bugs... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ Context fills up
โ
โโโโโโโโโโโโโโโโ
โ /compact โ
โ triggers โ
โโโโโโโโฌโโโโโโโโ
โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PreCompact Hook Fires โ
โ โข Loads conversation transcript โ
โ โข Smart chunking (IAO triplets) โ
โ โข Importance scoring โ
โ โข Multi-modal extraction โ
โ โข Stores in vector DB โ
โ โข Auto-prunes old memories โ
โ โข Creates hierarchical clusters โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โ
โโโโโโโโโโโโโโโโโโโโ
โ Compaction โ
โ (Claude Code's โ
โ internal) โ
โโโโโโโโโโโฌโโโโโโโโโ
โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SessionStart Hook Fires โ
โ โข Retrieves 5 recent important โ
โ โข Retrieves 10 relevant (vector) โ
โ โข Combines importance + relevance โ
โ โข Injects via additionalContext โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ New Session Starts โ
โ Claude sees previous memories! โ
โ Continuity preserved across gap โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Automatically scored by importance:
- ๐ด Critical (20+): Architectural decisions, major bug fixes, key learnings
- ๐ High (10-20): File creations, test successes, important changes
- ๐ก Medium (5-10): Code snippets, moderate edits, context
- ๐ข Low (<5): Routine work, minor changes
Example memories:
{
"intent": "Fix authentication bug causing 401 errors",
"action": "Modified auth.ts to properly handle token expiry, added refresh logic",
"outcome": "Tests passing, bug resolved",
"importance": 24.5,
"artifacts": {
"files": ["src/auth.ts", "tests/auth.test.ts"],
"code_snippets": ["async function refreshToken() { ... }"],
"architecture": ["token refresh flow"]
}
}Just use Claude Code normally! Memories are automatically:
- Extracted when compaction triggers (PreCompact hook)
- Injected when session resumes after compaction (SessionStart hook)
Browse and search your memories anytime with query_memories.py:
# View statistics
python3 ~/.claude/memory-hooks/query_memories.py --stats
# Search by topic (semantic)
python3 ~/.claude/memory-hooks/query_memories.py --topic "authentication bug fix"
# Search by keywords
python3 ~/.claude/memory-hooks/query_memories.py --keywords error crash failed
# High importance only
python3 ~/.claude/memory-hooks/query_memories.py --min-importance 15
# Find files involved in errors
python3 ~/.claude/memory-hooks/query_memories.py --files-involved --keywords bug
# Date range search
python3 ~/.claude/memory-hooks/query_memories.py --since "2025-10-12" --until "2025-10-13"
# Session-specific
python3 ~/.claude/memory-hooks/query_memories.py --session current --topic "recent work"
# Detailed output
python3 ~/.claude/memory-hooks/query_memories.py --topic "testing" --format detailed
# JSON output for scripting
python3 ~/.claude/memory-hooks/query_memories.py --topic "bugs" --format json๐ Memory Statistics
================================================================================
Total memories: 42
Average importance: 16.8
Importance Distribution:
๐ข Low : 3 ( 7.1%) โ
๐ก Medium : 5 ( 11.9%) โโ
๐ High : 28 ( 66.7%) โโโโโโโโโโโโโ
๐ด Critical : 6 ( 14.3%) โโ
Multi-modal Content:
๐ป Has code: 18 (42.9%)
๐ Has files: 32 (76.2%)
๐๏ธ Has architecture: 8 (19.0%)
Edit ~/.claude/memory-hooks/memory_scorer.py:
WEIGHTS = {
"decision_marker": 10.0, # "decided to", "chose"
"error_resolution": 8.0, # "fixed", "resolved"
"file_creation": 6.0, # New files created
"test_success": 5.0, # Tests passing
"learning": 7.0, # "learned", "discovered"
# ... adjust as needed
}Edit ~/.claude/memory-hooks/sessionstart_memory_injector.py:
TOP_K_MEMORIES = 20 # Maximum memories (adaptive returns 0-20)
RECENT_MEMORIES = 4 # Recent chronological
MIN_IMPORTANCE = 5.0 # Minimum score to inject
MIN_SIMILARITY = 0.35 # Minimum relevance threshold
KG_CACHE_TTL = 300 # Knowledge graph cache lifetime (seconds)Edit ~/.claude/memory-hooks/memory_pruner.py:
MAX_MEMORIES_PER_SESSION = 500 # Capacity limit
OLD_MEMORY_DAYS = 90 # Age threshold
LOW_IMPORTANCE_THRESHOLD = 3.0 # Importance cutoff
REDUNDANCY_THRESHOLD = 0.95 # Similarity for dedupEdit ~/.claude/memory-hooks/precompact_memory_extractor.py:
MAX_TRANSCRIPT_MESSAGES = 1000 # Max messages to process
AUTO_PRUNE = True # Auto-prune on compaction~/.claude/
โโโ memory-hooks/
โ โโโ precompact_memory_extractor.py # V4: Full transcript extraction
โ โโโ sessionstart_memory_injector.py # V5: Task-context aware injection
โ โโโ entity_extractor.py # Entity extraction
โ โโโ knowledge_graph.py # Graph construction
โ โโโ task_context_scorer.py # Task-context scoring
โ โโโ query_memories.py # CLI query interface
โ โโโ memory_scorer.py # Importance calculation
โ โโโ multimodal_extractor.py # Artifact extraction
โ โโโ memory_pruner.py # Auto-pruning logic
โ โโโ memory_clustering.py # Hierarchical clustering
โ โโโ requirements.txt # Python dependencies
โโโ memory_db/ # ChromaDB storage
โ โโโ (vector database files)
โโโ settings.json # Hook configuration
-
Extraction (PreCompact):
- Transcript โ Format โ Smart chunk โ Score โ Extract artifacts โ Embed โ Store
-
Retrieval (SessionStart):
- Query โ Vector search + Recent filter โ Rank by importance ร relevance โ Format โ Inject
-
Pruning (Automatic):
- Age-based: Old + low importance
- Redundancy: Near-duplicates (>95% similarity)
- Capacity: Keep top N by importance
All operations are logged to ~/.claude/memory_hooks_debug.log:
tail -f ~/.claude/memory_hooks_debug.log"No memories found"
- Haven't run
/compactyet - Hooks not configured correctly in settings.json
- Check:
cat ~/.claude/settings.json | grep hooks
"Import errors"
- Missing dependencies
- Fix:
pip install -r ~/.claude/memory-hooks/requirements.txt
"Hooks not firing"
- Check settings.json format (must use array with matcher)
- Ensure scripts are executable:
chmod +x ~/.claude/memory-hooks/*.py - Check debug log for errors
"Too few memories extracted"
- Increase MAX_TRANSCRIPT_MESSAGES in precompact_memory_extractor.py
- Adjust chunking thresholds for more granular chunks
"Too many memories"
- Lower MAX_MEMORIES_PER_SESSION in memory_pruner.py
- Raise MIN_IMPORTANCE in sessionstart_memory_injector.py
- Run manual pruning (see memory_pruner.py)
- Model:
nomic-ai/nomic-embed-text-v1.5(768 dimensions) - Context: 8192 tokens (16x better than old 512-token model!)
- Size: ~140MB
- Speed: ~500 embeddings/sec on CPU
- Quality: Ranks with top-10 models 70x bigger; perfect for code!
- Upgrade: 85% relevance improvement over old all-MiniLM-L6-v2
- Engine: ChromaDB with HNSW indexing
- Distance: Cosine similarity
- Indexing: Automatic on insert
- Storage: Persistent local disk
- Decision markers ("decided", "chose", "will use")
- Error resolution ("fixed", "resolved", "debugged")
- File operations (created, modified)
- Test success (tests passing)
- Learning indicators ("learned", "discovered")
- Tool usage count
- Code presence
- Architecture discussions
- Recency (exponential decay)
- Session context
- Natural boundaries: File operations, decisions, topic changes
- Grouped operations: 3-5 related file writes
- Size limits: Intent (500 chars), Action (1000 chars), Outcome (300 chars)
- Deduplication: Skip empty/duplicate chunks
See examples/ directory for:
- Real conversation memory extracts
- CLI usage examples
- Integration patterns
- Custom scoring configurations
Contributions welcome! Please see CONTRIBUTING.md.
# Clone and install in dev mode
git clone https://github.com/rhowardstone/claude-code-memory-system.git
cd claude-code-memory-system
pip install -r hooks/requirements.txt
# Test installation
./install.shMIT License - see LICENSE file for details.
- Built for Claude Code
- Uses ChromaDB for vector storage
- Embeddings from sentence-transformers
- Inspired by human episodic memory systems
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: docs/
- Cross-session memory (with explicit user permission)
- Memory visualization dashboard
- Custom scoring rules via config file
- Memory export formats (Markdown, HTML, PDF)
- Integration with external knowledge bases
- Multi-language support for code artifacts
- Compression for very old memories
- Collaborative memory sharing (team features)
Built with โค๏ธ for the Claude Code community