Memlayer is a memory-enhanced LLM wrapper that automatically builds and maintains a persistent knowledge graph from your conversations. It adds memory capabilities to any LLM provider (OpenAI, Claude, Gemini, Ollama) without changing how you interact with them.
-
Chat Flow: When you send a message via
.chat(), Memlayer:- Searches the knowledge graph for relevant context
- Injects that context into the LLM prompt via tool calls
- Returns the LLM's response to you
- Asynchronously extracts knowledge and updates the graph
-
Knowledge Extraction: After each conversation turn:
- Text is analyzed by a fast model (background thread)
- Facts, entities, and relationships are extracted
- Salience gate filters out trivial information
- Knowledge is stored in both vector DB and graph DB
-
Memory Search: When the LLM needs context:
- Hybrid search combines vector similarity + graph traversal
- Three search tiers available:
fast,balanced,deep - Results are ranked and returned as context
-
Background Services:
- Consolidation: Extracts knowledge from conversations (async)
- Curation: Expires time-sensitive facts (background thread)
- Salience Gate: Filters low-value information before storage
User Message
│
▼
Memory Search (if LLM calls tool)
│
▼
LLM Response Generated
│
├─► Return to User
│
└─► Background: Extract Knowledge → Store in Graph
User Message
│
▼
Memory Search (if LLM calls tool)
│
▼
LLM Starts Streaming
│
├─► Yield chunks to user in real-time
│
└─► Background: Buffer full response
│
└─► After stream completes → Extract Knowledge → Store in Graph
Works with OpenAI, Anthropic Claude, Google Gemini, and local Ollama models. Same API across all providers.
LLM automatically gets access to:
search_memory: Hybrid vector + graph searchschedule_task: Create time-based reminders
- fast: Vector-only search, <100ms
- balanced: Vector + 1-hop graph traversal
- deep: Full graph traversal with entity extraction
- Entity deduplication (e.g., "John" = "John Smith")
- Relationship tracking between entities
- Time-aware facts with expiration dates
- Importance scoring for fact prioritization
Choose embedding strategy based on your needs:
- online: API-based embeddings (OpenAI), fast startup
- local: Local sentence-transformer model, no API costs
- lightweight: Graph-only, no embeddings, fastest startup
from memlayer.wrappers.openai import OpenAI
client = OpenAI(
# Core settings
api_key="your-key",
model="gpt-4.1-mini",
user_id="user123",
# Memory behavior
operation_mode="online", # online | local | lightweight
salience_threshold=0.5, # 0.0-1.0, filters trivial content
# Storage paths
chroma_dir="./my_chroma_db",
networkx_path="./my_graph.pkl",
# Search behavior
max_search_results=5,
search_tier="balanced", # fast | balanced | deep
# Performance tuning
curation_interval=3600, # Check for expired facts every hour
embedding_model="text-embedding-3-small"
)response = client.chat([
{"role": "user", "content": "My name is Alice"}
])
# Knowledge automatically extracted and storedfor chunk in client.chat(
[{"role": "user", "content": "What's my name?"}],
stream=True
):
print(chunk, end="", flush=True)# Import knowledge from documents
client.update_from_text("""
Project Phoenix is led by Alice.
The project deadline is December 1st.
""")# Get memory-grounded answer
answer = client.synthesize_answer("Who leads Project Phoenix?")| Component | Latency | Notes |
|---|---|---|
| Memory search (fast) | 50-100ms | Vector search only |
| Memory search (balanced) | 100-300ms | Vector + 1-hop graph |
| Memory search (deep) | 300-1000ms | Full graph traversal |
| Knowledge extraction | 1-3s | Background, doesn't block response |
| Consolidation | 1-2s | Async, uses fast model |
| First-time salience init | 1-2s | Cached after first run |
-
Choose the right operation mode:
- Serverless →
onlinemode - Privacy-sensitive →
localmode - Demos/prototypes →
lightweightmode
- Serverless →
-
Use streaming for better UX:
- First chunk arrives in 1-3s
- Knowledge extraction happens in background
- User sees response immediately
-
Tune salience threshold:
- Low (0.3-0.5): Keep more memories, higher storage
- Medium (0.5-0.7): Balanced, recommended default
- High (0.7-0.9): Only important facts, minimal storage
-
Set expiration dates for time-sensitive facts:
- System automatically extracts expiration dates from text
- Curation service removes expired facts periodically
-
Use appropriate search tier:
fast: Quick lookups, high-traffic applicationsbalanced: Default, good recall with reasonable latencydeep: Complex questions needing graph reasoning
- Quickstart Guide: Get up and running in 5 minutes
- Streaming Mode: Deep dive into streaming behavior
- Operation Modes: Architecture implications of each mode
- Provider Setup: Provider-specific configuration