Dual-whitebox memory for LLM agents.
Make agent memory readable, editable, auditable, and testable before sending it back into the model context.
中文 | English
MASE is a dual-whitebox memory engine for LLM agents.
Most agent memory systems start with an opaque vector store: write everything, embed everything, retrieve top-k chunks, and hope the model resolves conflicts. MASE takes the opposite path:
- Govern memory first.
- Keep the minimum necessary facts.
- Expose memory in forms humans and tests can inspect.
- Only then inject memory into the model context.
MASE splits agent memory into two controllable layers:
| Layer | Purpose |
|---|---|
| Event Log | Append-only conversational and operational history for recall, replay, and audit. |
| Entity Fact Sheet | Current structured facts where newer facts can override stale or conflicting ones. |
| Markdown / tri-vault | Human-readable memory externalization for review, migration, and debugging. |
The goal is not to replace semantic search everywhere. The goal is to make long-lived agent memory observable and correctable.
Long-context models do not remove the need for memory governance. If an agent remembers stale preferences, contradictory facts, or unsafe file state, a larger context window only makes the failure harder to debug.
MASE focuses on problems that show up in real agent systems:
- facts change over time;
- user preferences conflict with old sessions;
- agents need cross-session continuity;
- memory writes must be reviewable;
- recall should explain why something was selected;
- tests should verify memory behavior without relying on a black box.
In short:
MASE turns agent memory from "hidden retrieval magic" into an engineering surface.
User / Agent Runtime
|
v
Router -> Notetaker -> Planner -> Action -> Executor
| | |
v v v
SQLite + FTS5 Entity Facts Markdown / tri-vault
|
v
Bounded recall context for the LLM
Core ideas:
- SQLite + FTS5 for deterministic, portable event and fact search.
- Entity Fact Sheet for update-aware memory instead of endless fact accumulation.
- Markdown / tri-vault for readable memory artifacts.
- Hybrid recall for combining keyword signals, structured facts, and LLM-assisted filtering.
- Compatibility surfaces for LangChain, LlamaIndex, MCP, and OpenAI-compatible endpoints.
MASE has been evaluated across long-context and memory-oriented benchmarks:
| Benchmark | Model / Setting | MASE | Baseline | Delta |
|---|---|---|---|---|
| LV-Eval EN 256k | qwen2.5:7b local | 88.71% | 4.84% | +84pp |
| NoLiMa ONLYDirect 32k | qwen2.5:7b local, MASE chunked | 60.71% | 1.79% | +58.9pp |
| LongMemEval-S 500 | GLM-5 + kimi-k2.5 verifier | 61.0% official substring / 80.2% LLM-judge | 70.4% substring / 72.4% LLM-judge | +7.8pp judge |
LongMemEval is reported with multiple lanes:
- 61.0% (305/500): official substring-comparable lane.
- 80.2% (401/500): LLM-judge lane from the same iter2 full_500 run.
- 84.8% (424/500): post-hoc combined/retry diagnostic, not the public headline.
Detailed benchmark notes live in BENCHMARKS.md and docs/benchmark_claims/.
git clone https://github.com/zbl1998-sdjn/MASE-agent-memory.git
cd MASE-agent-memory
pip install -e ".[dev]"
python -m pytest tests/ -q
python mase_cli.pyFor benchmark or long-running local work, keep generated memory stores outside the source checkout:
export MASE_RUNS_DIR=../MASE-runsOn Windows PowerShell:
$env:MASE_RUNS_DIR = "..\MASE-runs"Run these before integration work or pull requests:
python -m pytest -q
python -m mypy
python -m compileall -q -x "(legacy_archive|run_artifacts|dist|build|\.venv|venv|benchmarks/external-benchmarks|__pycache__|\.pytest_cache)" .
npm --prefix frontend run typecheck
npm --prefix frontend test
npm --prefix frontend run build
git diff --checkpython -m mypy is intentionally gradual. Current strict coverage is limited to executor.py, planner_agent.py, router.py, model_interface.py, and protocol.py.
MASE exposes several integration surfaces:
- LangChain
BaseChatMemory - LlamaIndex
BaseMemory - MCP server for Claude Desktop / Cursor-style clients
- OpenAI-compatible endpoint
- FastAPI sidecar for local AI agent platforms
Example with LangChain:
from integrations.langchain.mase_memory import MASEMemory
memory = MASEMemory(thread_id="zbl1998::main", top_k=8)
agent_executor.invoke(
{"input": "What budget did I mention last time?"},
config={"memory": memory},
)MASE is strongest when the task requires:
- updated user or project facts;
- cross-session continuity;
- explainable recall;
- human-readable memory review;
- lightweight local persistence;
- benchmarkable memory behavior;
- sidecar integration with an agent SaaS or local agent runtime.
MASE is still an alpha-stage engineering project. It is not yet a universal retrieval layer.
Known boundaries:
- strong synonym and semantic-generalization recall still needs more work;
- large document-level semantic retrieval is not the primary path yet;
- high-concurrency server-grade deployment requires more runtime hardening;
- benchmark claims should be read with the documented lane definitions.
- White-box semantic retrieval with write-time tags, read-time expansion, FTS, and LLM filtering.
- More server-grade async/runtime hardening.
- Broader benchmark triangulation.
- More integrations across LangChain, LlamaIndex, MCP, OpenAI-compatible APIs, and agent SaaS platforms.
- Memory review workflows before long-term fact/procedure writes.
Stable Core, Compatibility Surface, and Experimental Surface are defined in:
docs/ARCHITECTURE_BOUNDARIES.mddocs/BENCHMARK_ANTI_OVERFIT.md
Issues and pull requests are welcome, especially for:
- new model backend adapters;
- benchmark reruns and independent reports;
- integration examples;
- real-world long-memory failure cases;
- memory governance and audit workflows.
@software{mase2026,
author = {zbl1998-sdjn},
title = {{MASE}: Memory-Augmented Smart Entity — Schema-less SQLite memory for LLM agents},
year = {2026},
url = {https://github.com/zbl1998-sdjn/MASE-agent-memory},
note = {Lifts qwen2.5:7b from 1.79\% to 60.71\% on NoLiMa-32k; 61.0\% official substring / 80.2\% LLM-judge on LongMemEval-S}
}MASE started from a simple fear: as AI systems become more powerful, their hidden memory becomes harder to trust.
Instead of treating memory as an invisible vector database, MASE keeps memory small, structured, readable, and correctable. It is built around the belief that reliable agents need transparent memory governance before they need more context.
There is no "single heroic model" here. MASE is a lightweight system where Router, Notetaker, Planner, Action, Executor, SQLite, and Markdown each do a small, inspectable job.
If you believe agent memory should be auditable by default, welcome to MASE.
