Memlayer Overview

What is Memlayer?

Memlayer is a memory-enhanced LLM wrapper that automatically builds and maintains a persistent knowledge graph from your conversations. It adds memory capabilities to any LLM provider (OpenAI, Claude, Gemini, Ollama) without changing how you interact with them.

Core Architecture

How It Works

Chat Flow: When you send a message via .chat(), Memlayer:
- Searches the knowledge graph for relevant context
- Injects that context into the LLM prompt via tool calls
- Returns the LLM's response to you
- Asynchronously extracts knowledge and updates the graph
Knowledge Extraction: After each conversation turn:
- Text is analyzed by a fast model (background thread)
- Facts, entities, and relationships are extracted
- Salience gate filters out trivial information
- Knowledge is stored in both vector DB and graph DB
Memory Search: When the LLM needs context:
- Hybrid search combines vector similarity + graph traversal
- Three search tiers available: fast, balanced, deep
- Results are ranked and returned as context
Background Services:
- Consolidation: Extracts knowledge from conversations (async)
- Curation: Expires time-sensitive facts (background thread)
- Salience Gate: Filters low-value information before storage

Data Flow

Normal Chat (Non-Streaming)

User Message
    │
    ▼
Memory Search (if LLM calls tool)
    │
    ▼
LLM Response Generated
    │
    ├─► Return to User
    │
    └─► Background: Extract Knowledge → Store in Graph

Streaming Chat

User Message
    │
    ▼
Memory Search (if LLM calls tool)
    │
    ▼
LLM Starts Streaming
    │
    ├─► Yield chunks to user in real-time
    │
    └─► Background: Buffer full response
            │
            └─► After stream completes → Extract Knowledge → Store in Graph

Key Features

1. Provider-Agnostic

Works with OpenAI, Anthropic Claude, Google Gemini, and local Ollama models. Same API across all providers.

2. Automatic Memory Tools

LLM automatically gets access to:

search_memory: Hybrid vector + graph search
schedule_task: Create time-based reminders

3. Flexible Search Tiers

fast: Vector-only search, <100ms
balanced: Vector + 1-hop graph traversal
deep: Full graph traversal with entity extraction

4. Knowledge Graph Features

Entity deduplication (e.g., "John" = "John Smith")
Relationship tracking between entities
Time-aware facts with expiration dates
Importance scoring for fact prioritization

5. Operation Modes

Choose embedding strategy based on your needs:

online: API-based embeddings (OpenAI), fast startup
local: Local sentence-transformer model, no API costs
lightweight: Graph-only, no embeddings, fastest startup

Configuration Options

from memlayer.wrappers.openai import OpenAI

client = OpenAI(
    # Core settings
    api_key="your-key",
    model="gpt-4.1-mini",
    user_id="user123",
    
    # Memory behavior
    operation_mode="online",        # online | local | lightweight
    salience_threshold=0.5,         # 0.0-1.0, filters trivial content
    
    # Storage paths
    chroma_dir="./my_chroma_db",
    networkx_path="./my_graph.pkl",
    
    # Search behavior
    max_search_results=5,
    search_tier="balanced",          # fast | balanced | deep
    
    # Performance tuning
    curation_interval=3600,          # Check for expired facts every hour
    embedding_model="text-embedding-3-small"
)

Common Usage Patterns

Basic Chat

response = client.chat([
    {"role": "user", "content": "My name is Alice"}
])
# Knowledge automatically extracted and stored

Streaming Chat

for chunk in client.chat(
    [{"role": "user", "content": "What's my name?"}],
    stream=True
):
    print(chunk, end="", flush=True)

Direct Knowledge Ingestion

# Import knowledge from documents
client.update_from_text("""
Project Phoenix is led by Alice.
The project deadline is December 1st.
""")

Synthesized Q&A

# Get memory-grounded answer
answer = client.synthesize_answer("Who leads Project Phoenix?")

Performance Characteristics

Component	Latency	Notes
Memory search (fast)	50-100ms	Vector search only
Memory search (balanced)	100-300ms	Vector + 1-hop graph
Memory search (deep)	300-1000ms	Full graph traversal
Knowledge extraction	1-3s	Background, doesn't block response
Consolidation	1-2s	Async, uses fast model
First-time salience init	1-2s	Cached after first run

Best Practices

Choose the right operation mode:
- Serverless → online mode
- Privacy-sensitive → local mode
- Demos/prototypes → lightweight mode
Use streaming for better UX:
- First chunk arrives in 1-3s
- Knowledge extraction happens in background
- User sees response immediately
Tune salience threshold:
- Low (0.3-0.5): Keep more memories, higher storage
- Medium (0.5-0.7): Balanced, recommended default
- High (0.7-0.9): Only important facts, minimal storage
Set expiration dates for time-sensitive facts:
- System automatically extracts expiration dates from text
- Curation service removes expired facts periodically
Use appropriate search tier:
- fast: Quick lookups, high-traffic applications
- balanced: Default, good recall with reasonable latency
- deep: Complex questions needing graph reasoning

Next Steps

Quickstart Guide: Get up and running in 5 minutes
Streaming Mode: Deep dive into streaming behavior
Operation Modes: Architecture implications of each mode
Provider Setup: Provider-specific configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memlayer Overview

What is Memlayer?

Core Architecture

How It Works

Data Flow

Normal Chat (Non-Streaming)

Streaming Chat

Key Features

1. Provider-Agnostic

2. Automatic Memory Tools

3. Flexible Search Tiers

4. Knowledge Graph Features

5. Operation Modes

Configuration Options

Common Usage Patterns

Basic Chat

Streaming Chat

Direct Knowledge Ingestion

Synthesized Q&A

Performance Characteristics

Best Practices

Next Steps

FilesExpand file tree

overview.md

Latest commit

History

overview.md

File metadata and controls

Memlayer Overview

What is Memlayer?

Core Architecture

How It Works

Data Flow

Normal Chat (Non-Streaming)

Streaming Chat

Key Features

1. Provider-Agnostic

2. Automatic Memory Tools

3. Flexible Search Tiers

4. Knowledge Graph Features

5. Operation Modes

Configuration Options

Common Usage Patterns

Basic Chat

Streaming Chat

Direct Knowledge Ingestion

Synthesized Q&A

Performance Characteristics

Best Practices

Next Steps