AutoMem

Long-term memory for AI assistants. Graph + vector. Runs on your hardware.

AutoMem

Your AI forgets between sessions. RAG dumps documents that look similar. Vector databases match keywords but miss meaning. None of them learn.

AutoMem stores typed relationships and embeddings. When you ask "why did we choose PostgreSQL?", recall returns not just the matching memory — but the alternatives you considered, the principle behind the choice, and the related decisions that came after.

Current canonical benchmark results are 87.00% on LongMemEval full with 97.00% recall@5, and 84.74% on LoCoMo full. See benchmarks/EXPERIMENT_LOG.md for methodology, judge policy, category breakdowns, and historical runs.

On the neutral Agent Memory Benchmark

AutoMem 0.16.0 was run through the neutral Agent Memory Benchmark (AMB, by vectorize-io) on a self-spinning FalkorDB + Qdrant stack with FastEmbed-local bge-base-en-v1.5 (768d) — no embedding API keys. The honest summary: AutoMem's strength is large-context scaling and efficiency, not verbatim conversational recall.

BEAM is the apples-to-apples axis (same benchmark, same Gemini answerer + judge). AutoMem scores above Honcho at every BEAM tier, and the gap widens with scale: +4.5pp at 100k, +0.7pp at 500k, +0.7pp at 1M, +16.8pp at 10M. AutoMem degrades gracefully — 67.5% → 57.4% (−10pp) across a 100× haystack increase — while Honcho holds roughly flat through 1M, then drops to 40.6% at 10M. That places AutoMem #2 on BEAM, behind vectorize's own Hindsight (~73→64% across the curve).
At 10M tokens, AutoMem holds 57.4% ±5.5% while Honcho falls to ~41%. At that scale, context-stuffing is physically impossible, so the score reflects retrieval architecture, not context window.
Efficiency is architectural: AutoMem feeds the answerer ~2.6–4.8k context tokens at every scale (mean), versus 17–27k for the board leader on BEAM.
The honest other half: on conversational Core-3, AutoMem trails the AMB leader Hindsight — locomo 85.1% vs 92%, longmemeval 74.4% vs 94.6%, personamem 76.1% vs 86.6%. Pick AutoMem for large-context scaling and efficiency, not for top-of-board verbatim recall.

Outputs are committed and public, and AUTOMEM_REPRODUCE.md gives one command per split so you can run it yourself. AutoMem is submitted to the neutral board (provider PR under review) — not yet live on the public leaderboard. Full head-to-head numbers live at automem.ai/benchmarks.

Should you use AutoMem?

Use AutoMem if...	Look elsewhere if...
You want one memory across Claude / Cursor / ChatGPT / Codex	You need SOC2 / HIPAA audit logs and row-level ACLs
You're comfortable self-hosting (Docker or Railway)	You want a managed SaaS with a polished dashboard
You're a solo dev, prosumer, or small team	You're running a multi-agent swarm needing per-agent memory isolation
You want to own your memory data	You need an enterprise SLA and dedicated support

If your row is on the right, AutoMem isn't it — yet. Try Mem0, Letta, or Zep instead.

How it works

AutoMem combines two storage layers behind a single API:

FalkorDB stores memories as nodes with 11 typed relationships between them. The graph is the canonical record.
Qdrant stores an embedding for every memory. Recall is a hybrid query — semantic similarity, graph traversal, temporal alignment, tag overlap, and importance — ranked by a 9-component score.

flowchart TB
    subgraph service [AutoMem Service Flask]
        API[REST API<br/>Memory Lifecycle]
        Enrichment[Background Enrichment<br/>Pipeline]
        Consolidation[Consolidation<br/>Engine]
        Backups[Automated Backups<br/>Optional]
    end

    subgraph storage [Dual Storage Layer]
        FalkorDB[(FalkorDB<br/>Graph Database)]
        Qdrant[(Qdrant<br/>Vector Database)]
    end

    Client[AI Client] -->|Store/Recall/Associate| API
    API --> FalkorDB
    API --> Qdrant
    Enrichment -->|11 edge types<br/>Pattern nodes| FalkorDB
    Enrichment -->|Semantic search<br/>1024-d vectors| Qdrant
    Consolidation --> FalkorDB
    Consolidation --> Qdrant
    Backups -.->|Optional| FalkorDB
    Backups -.->|Optional| Qdrant

If Qdrant is unavailable, the graph still serves recall in a degraded mode. If FalkorDB is down, the API returns 503 — the graph is the source of truth.

Multi-hop bridge discovery

Ask "why boring tech for Kafka?" and AutoMem doesn't just match the word "Kafka". It traverses the graph from the seed memories to find the bridge that connects them:

Seed 1: "Migrated to PostgreSQL for operational simplicity"
Seed 2: "Evaluating Kafka vs RabbitMQ for message queue"
Bridge: "Team prefers boring technology — proven, debuggable systems"

Both seeds carry an EXEMPLIFIES edge to the bridge memory. AutoMem ranks the bridge above the seeds and surfaces it in the recall response, so the assistant answers with your reasoning, not isolated facts. Tune via expand_relations, relation_limit, and expansion_limit on GET /recall.

11 authorable relationship types

Type	Use case	Example
`RELATES_TO`	General connection	Bug report → Related issue
`LEADS_TO`	Causal relationship	Problem → Solution
`OCCURRED_BEFORE`	Temporal sequence	Planning → Execution
`PREFERS_OVER`	User preferences	PostgreSQL → MongoDB
`EXEMPLIFIES`	Pattern examples	Code review → Best practice
`CONTRADICTS`	Conflicting info	Old approach → New approach
`REINFORCES`	Supporting evidence	Decision → Validation
`INVALIDATED_BY`	Outdated info	Legacy docs → Current docs
`EVOLVED_INTO`	Knowledge evolution	Initial design → Final design
`DERIVED_FROM`	Source tracking	Implementation → Spec
`PART_OF`	Hierarchical structure	Feature → Epic

Three more edge types are added automatically by the enrichment pipeline and consolidation engine: SIMILAR_TO, PRECEDED_BY, and DISCOVERED.

Memory consolidation, neuroscience-inspired

AutoMem implements biological memory consolidation cycles. Wrong rabbit holes fade naturally. Important memories with strong connections strengthen over time.

Cycle	Frequency	Purpose
Decay	Daily	Exponential relevance scoring (age, access, connections, importance)
Creative	Weekly	REM-like processing that discovers non-obvious connections
Cluster	Monthly	Groups similar memories, generates meta-patterns
Forget	Off by default	Archives low-relevance memories (<0.2), deletes very old (<0.05)

Tune intervals via CONSOLIDATION_*_INTERVAL_SECONDS. See docs/ENVIRONMENT_VARIABLES.md.

For more on the recall scoring formula, enrichment internals, and how AutoMem differs from RAG and pure vector databases, see docs/COMPARISON.md.

Research foundation

AutoMem implements techniques from peer-reviewed memory research:

HippoRAG 2 (Ohio State, 2025) — graph + vector hybrid for associative memory
A-MEM (2025) — Zettelkasten-inspired dynamic memory organization
MELODI (DeepMind, 2024) — gist-based memory compression
ReadAgent (DeepMind, 2024) — episodic memory for context extension

Full writeups, findings, and how AutoMem implements each → docs/RESEARCH.md.

Run it

Railway (60 seconds)

Recommended Railway projects run AutoMem as a small service group: automem (the API), automem-graph-viewer (the standalone UI), falkordb (graph), qdrant (vectors), and mcp-automem (the MCP bridge for ChatGPT, Claude.ai, and ElevenLabs). Services use pre-built Docker images and auto-redeploy on :stable, so Railway does not spend compute rebuilding source.

→ Full setup: INSTALLATION.md

Docker Compose (local)

git clone https://github.com/verygoodplugins/automem.git
cd automem
make dev

Service	URL	Purpose
AutoMem API	`http://localhost:8001`	Memory REST API
FalkorDB	`localhost:6379`	Graph database
Qdrant	`localhost:6333`	Vector database
FalkorDB Browser	`http://localhost:3000`	Local graph inspection UI

→ Full setup: INSTALLATION.md

Python (development)

make install
source .venv/bin/activate
PORT=8001 python app.py

Requires Python 3.10+ (3.12 recommended). → INSTALLATION.md

Contributing: feature PRs target the develop branch. main only moves via validated release merges, so users deploying from main (e.g. Railway auto-deploys) see one deploy per release instead of one per PR.

Connect your AI

Client	Mode	Setup
Claude Desktop, Cursor, Claude Code, Codex, Copilot, Antigravity	Local MCP bridge	`npx @verygoodplugins/mcp-automem setup`
ChatGPT (developer mode), Claude.ai web/mobile, ElevenLabs Agents	Remote MCP (HTTPS)	`docs/MCP_SSE.md`
Anything else	Direct REST API	`docs/API.md`

The MCP bridge is published as @verygoodplugins/mcp-automem. It handles client-specific config (rules files, hooks, templates) and proxies to your AutoMem service — local or Railway.

Direct API call:

import requests

token = "your-automem-api-token"

requests.post(
    "https://your-automem.railway.app/memory",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "content": "Chose PostgreSQL over MongoDB for ACID compliance",
        "type": "Decision",
        "tags": ["database", "architecture"],
        "importance": 0.9,
    },
)

Screenshots

Screenshots will be added once the referenced in-repo image assets are available.

Known limitations

AutoMem is pre-1.0 and honest about its rough edges. The active ones for recall quality:

Tags are a hard gate, not a soft boost. Tags filter before scoring, so a memory missing the queried tag won't surface even on a perfect semantic match. Within a tag scope, high-importance off-topic memories can still over-rank — mitigated by the opt-in RECALL_RELEVANCE_GATE, not yet on by default (#130).
Temporal and preference updates. Recall doesn't yet reliably prefer the newest version of a conflicting fact, or fully resolve multi-session preference updates. The RECALL_RECENCY_BIAS=auto re-rank helps temporal-intent queries but stays opt-in pending broader validation (#158, #159).
The MCP SSE bridge doesn't forward state_mode. The HTTP recall API supports it; the SSE proxy doesn't pass it through yet (#172).
Entity-node synthesis is experimental and off by default. First-class Entity nodes (IDENTITY_SYNTHESIS_ENABLED) are gated off while people-entity word-pair noise is addressed (#181).

Docs, community, and license

Setup

Installation guide — Railway, Docker, development
Qdrant setup — vector database configuration
Environment variables — full reference

API and integration

API reference — endpoints, scoring, enrichment
Remote MCP — ChatGPT, Claude.ai, ElevenLabs
Migrations — embedding dimensions, 0.16.0 data migrations, MCP SQLite import

Research and comparison

Research foundation — papers and how AutoMem implements them
Comparison — vs. RAG, vector DBs, building your own
Benchmark history — internal LoCoMo / LongMemEval harness runs + the neutral AMB (BEAM + Core-3) summary
AMB head-to-head + reproducibility — neutral Agent Memory Benchmark results and the AUTOMEM_REPRODUCE.md "run it yourself" recipe

Operations

Scripts — maintenance, migration, recovery, and eval tooling, by lifecycle
Health monitoring & backups
Testing guide — unit, integration, benchmarks

Community

automem.ai — official site
Discord — community chat
X / @automem_ai — updates
YouTube / @AutoJackBot — tutorials
GitHub issues — bugs and feature requests

Sibling repos

mcp-automem — universal MCP bridge / install funnel
automem-evals — exploratory recall-quality lab
automem-graph-viewer — standalone graph visualization

MIT licensed. Deploy anywhere. No vendor lock-in.

Name		Name	Last commit message	Last commit date
Latest commit History 408 Commits
.agents/skills/automem-regression-drift-scout		.agents/skills/automem-regression-drift-scout
.github/workflows		.github/workflows
.railway		.railway
automem		automem
benchmarks		benchmarks
docs		docs
mcp-sse-server		mcp-sse-server
scripts		scripts
tests		tests
.cursorignore		.cursorignore
.dockerignore		.dockerignore
.env.example		.env.example
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.nixpacksignore		.nixpacksignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.secrets.baseline		.secrets.baseline
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
app.py		app.py
consolidation.py		consolidation.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
railway.json		railway.json
railway.toml		railway.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
run-integration-tests.sh		run-integration-tests.sh
test-live-server-auto.sh		test-live-server-auto.sh
test-live-server.sh		test-live-server.sh
test-locomo-benchmark.sh		test-locomo-benchmark.sh
test-longmemeval-benchmark.sh		test-longmemeval-benchmark.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoMem

On the neutral Agent Memory Benchmark

Should you use AutoMem?

How it works

Multi-hop bridge discovery

11 authorable relationship types

Memory consolidation, neuroscience-inspired

Research foundation

Run it

Railway (60 seconds)

Docker Compose (local)

Python (development)

Connect your AI

Screenshots

Known limitations

Docs, community, and license

About

Uh oh!

Releases 11

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AutoMem

On the neutral Agent Memory Benchmark

Should you use AutoMem?

How it works

Multi-hop bridge discovery

11 authorable relationship types

Memory consolidation, neuroscience-inspired

Research foundation

Run it

Railway (60 seconds)

Docker Compose (local)

Python (development)

Connect your AI

Screenshots

Known limitations

Docs, community, and license

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Contributors

Uh oh!

Languages