Welcome to the Memlayer examples! This directory contains comprehensive examples organized by topic.
examples/
├── 01_basics/ # Getting started with Memlayer
├── 02_search_tiers/ # Fast, Balanced, and Deep search modes
├── 03_features/ # Advanced features (tasks, knowledge graph)
├── 04_benchmarks/ # Performance comparisons
├── 05_providers/ # Provider-specific examples (OpenAI, Claude, etc.)
└── README.md # This file
pip install memlayer# For OpenAI
export OPENAI_API_KEY='sk-...'
# For Claude
export ANTHROPIC_API_KEY='sk-ant-...'
# For Gemini
export GOOGLE_API_KEY='...'
# For Ollama (local, no key needed)
ollama pull qwen3:1.7bpython examples/01_basics/getting_started.pyStart here if you're new to Memlayer!
getting_started.py - Simple introduction to Memlayer
- Store and retrieve memories
- Automatic knowledge consolidation
- Basic conversation patterns
python examples/01_basics/getting_started.pyLearn about the three search modes optimized for different use cases.
Memlayer provides three search tiers optimized for different use cases.
fast_tier_example.py - Quick lookups (<100ms)
python examples/02_search_tiers/fast_tier_example.py- 2 vector search results
- No graph traversal
- Real-time chat applications
balanced_tier_example.py - Standard search (<500ms) [DEFAULT]
python examples/02_search_tiers/balanced_tier_example.py- 5 vector search results
- No graph traversal
- General conversation
deep_tier_example.py - Comprehensive search (<2s)
python examples/02_search_tiers/deep_tier_example.py- 10 vector search results
- Graph traversal enabled
- Complex queries, relationship discovery
search_tiers_demo.py - Complete demonstration of all tiers
python examples/02_search_tiers/search_tiers_demo.pytier_comparison.py - Side-by-side performance comparison
python examples/02_search_tiers/tier_comparison.py| Tier | Latency | Results | Graph | Use Case |
|---|---|---|---|---|
| Fast | <100ms | 2 | ❌ | Chatbots, real-time |
| Balanced | <500ms | 5 | ❌ | General conversation |
| Deep | <2s | 10 | ✅ | Research, multi-hop reasoning |
Advanced capabilities and integrations.
task_reminders.py - Proactive task management
python examples/03_features/task_reminders.py- Schedule future tasks
- Automatic reminders when due
- Natural language date parsing
test_knowledge_graph.py - Knowledge graph demonstration
python examples/03_features/test_knowledge_graph.py- Entity and relationship extraction
- Graph-based memory storage
- Visual inspection of knowledge graph
Performance comparisons and measurements.
compare_operation_modes.py - Compare memory filtering modes
python examples/04_benchmarks/compare_operation_modes.pyCompares three salience (memory filtering) modes:
- LOCAL: Sentence-transformers (slow startup, high accuracy)
- ONLINE: OpenAI embeddings API (fast startup, API cost)
- LIGHTWEIGHT: Keyword-based (instant startup, no embeddings)
Results:
LIGHTWEIGHT: ~5s startup | No API cost | Graph-only storage
ONLINE: ~5s startup | Small API cost | Full vector + graph
LOCAL: ~10s startup | No API cost | Full vector + graph
Provider-specific examples for each supported LLM.
openai_example.py - OpenAI/GPT integration
export OPENAI_API_KEY='sk-...'
python examples/05_providers/openai_example.pyclaude_example.py - Anthropic Claude integration
export ANTHROPIC_API_KEY='sk-ant-...'
python examples/05_providers/claude_example.pygemini_example.py - Google Gemini integration
export GOOGLE_API_KEY='...'
python examples/05_providers/gemini_example.pyollama_example.py - Ollama (local) integration
ollama pull qwen3:1.7b
python examples/05_providers/ollama_example.pySee 05_providers/README.md for detailed provider comparisons.
Direct API usage and advanced features.
direct_knowledge_ingestion.py - Direct memory updates
python examples/06_api/direct_knowledge_ingestion.py- Bypass conversation loop
- Directly ingest documents/text
- Efficient bulk knowledge loading
streaming_example.py - Streaming responses ✨ NEW
python examples/06_api/streaming_example.py- Real-time response streaming
- Works with all providers (OpenAI, Claude, Gemini, Ollama)
- Better UX for long responses
- Supports memory search tools
Example usage:
from memlayer.wrappers.openai import OpenAI
client = OpenAI(model="gpt-4.1-mini")
# Enable streaming with stream=True
stream = client.chat(
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True # 🔥 Enable streaming
)
# Iterate over chunks as they arrive
for chunk in stream:
print(chunk, end="", flush=True)→ Start with 01_basics/getting_started.py
→ Use default settings (balanced tier, local mode)
→ Use 02_search_tiers/fast_tier_example.py
→ Set operation_mode="online" for fastest startup
→ Use 02_search_tiers/deep_tier_example.py
→ Wait 3-5s after conversations for graph consolidation
→ Use 03_features/task_reminders.py
→ Schedule tasks with natural language dates
→ Use 05_providers/ollama_example.py
→ Set operation_mode="local" (no API calls)
→ Use any provider with operation_mode="online"
→ Or use operation_mode="lightweight" for graph-only
from memlayer.wrappers.openai import OpenAI
client = OpenAI(
api_key="your-key",
model="gpt-4.1-mini",
storage_path="./my_memories",
user_id="user_123"
)
# Store information
client.chat([
{"role": "user", "content": "My favorite color is blue"}
])
# Retrieve information (automatic)
response = client.chat([
{"role": "user", "content": "What's my favorite color?"}
])# Fast search
response = client.chat([
{"role": "user", "content": "Quick question: What's my name?"}
])
# Deep search with graph traversal
response = client.chat([
{"role": "user", "content": "Tell me everything about my work. Use deep search."}
])# OpenAI
from memlayer.wrappers.openai import OpenAI
client = OpenAI(api_key="...", model="gpt-4.1-mini")
# Claude
from memlayer.wrappers.claude import Claude
client = Claude(api_key="...", model="claude-3-5-sonnet-20241022")
# Gemini
from memlayer.wrappers.gemini import Gemini
client = Gemini(api_key="...", model="gemini-2.5-flash-lite")
# Ollama (local)
from memlayer.wrappers.ollama import Ollama
client = Ollama(host="http://localhost:11434", model="qwen3:1.7b")| Scenario | Recommended Tier | Reason |
|---|---|---|
| Chatbot responses | Fast | Low latency required |
| Simple factual recall | Fast | Few memories needed |
| General conversation | Balanced | Good accuracy/speed balance |
| Research queries | Deep | Need comprehensive results |
| Finding connections | Deep | Graph traversal required |
| "Tell me everything about X" | Deep | Multi-source synthesis |
Based on typical queries:
Fast: ~50-150ms (2 vector results)
Balanced: ~200-600ms (5 vector results)
Deep: ~800-2500ms (10 vector results + graph traversal)
- Vector Search: Retrieves top 10 semantically similar memories
- Entity Extraction: LLM extracts key entities from the query
- Example: "Tell me about Alice" → ["Alice"]
- Graph Traversal: For each entity, traverse 1 hop in the knowledge graph
- Finds relationships: "Alice --[works on]--> Project Phoenix"
- Combination: Merges vector results with graph relationships
- Synthesis: LLM creates comprehensive answer from all sources
# Day 1: Store basic info
client.chat([{"role": "user", "content": "I'm working on Project X"}])
# Day 2: Add details
client.chat([{"role": "user", "content": "Project X uses Python and React"}])
# Day 3: Query everything
client.chat([{"role": "user", "content": "What do you know about my projects?"}])# Store interconnected data
client.chat([{"role": "user", "content": "Alice leads Project Phoenix"}])
client.chat([{"role": "user", "content": "Project Phoenix is in London"}])
# Query with deep search for relationships
client.chat([{
"role": "user",
"content": "Tell me about Alice (use deep search)"
}])
# Response includes: Alice's role, project, location via graphresponse = client.chat(messages)
# Inspect search performance
if client.last_trace:
for event in client.last_trace.events:
print(f"{event.event_type}: {event.duration_ms}ms")
print(f"Metadata: {event.metadata}")- Background Consolidation: Knowledge graph building happens in a background thread. Wait a few seconds after conversations for graph to populate.
- First Run: Initial runs may not show graph relationships. Run examples twice to see full deep search capabilities.
- Storage: Each example creates its own memory directory to avoid conflicts.
- LLM Auto-Selection: The LLM often chooses the appropriate search tier automatically based on query complexity.
- Hybrid Search Implementation - Technical details
- Main README - Project overview
- Use Fast tier for high-traffic applications where latency matters
- Use Balanced tier as your default (already is the default)
- Use Deep tier when you need comprehensive answers with relationship reasoning
- Explicit tier requests work: "use deep search" in your query
- Check traces to understand what search operations were performed
- Wait for consolidation before querying stored information (2-5 seconds)
Q: Deep search doesn't show graph relationships
- A: Wait longer for background consolidation (try 5 seconds)
- A: Run the example twice - first run builds the graph
Q: Import is slow
- A: First import loads models (~0.7s). Subsequent imports are cached.
Q: No memories found
- A: Ensure you waited for consolidation after storing information
- A: Check that
storage_pathdirectory was created
Q: API errors
- A: Verify your API key is set correctly
- A: Check you have API credits/quota remaining