MASE

Dual-whitebox memory for LLM agents.

Make agent memory readable, editable, auditable, and testable before sending it back into the model context.

What is MASE?

MASE is a dual-whitebox memory engine for LLM agents.

Most agent memory systems start with an opaque vector store: write everything, embed everything, retrieve top-k chunks, and hope the model resolves conflicts. MASE takes the opposite path:

Govern memory first.
Keep the minimum necessary facts.
Expose memory in forms humans and tests can inspect.
Only then inject memory into the model context.

MASE splits agent memory into two controllable layers:

Layer	Purpose
Event Log	Append-only conversational and operational history for recall, replay, and audit.
Entity Fact Sheet	Current structured facts where newer facts can override stale or conflicting ones.
Markdown / tri-vault	Human-readable memory externalization for review, migration, and debugging.

The goal is not to replace semantic search everywhere. The goal is to make long-lived agent memory observable and correctable.

Why it matters

Long-context models do not remove the need for memory governance. If an agent remembers stale preferences, contradictory facts, or unsafe file state, a larger context window only makes the failure harder to debug.

MASE focuses on problems that show up in real agent systems:

facts change over time;
user preferences conflict with old sessions;
agents need cross-session continuity;
memory writes must be reviewable;
recall should explain why something was selected;
tests should verify memory behavior without relying on a black box.

In short:

MASE turns agent memory from "hidden retrieval magic" into an engineering surface.

Architecture

User / Agent Runtime
        |
        v
Router -> Notetaker -> Planner -> Action -> Executor
        |             |            |
        v             v            v
 SQLite + FTS5   Entity Facts   Markdown / tri-vault
        |
        v
Bounded recall context for the LLM

Core ideas:

SQLite + FTS5 for deterministic, portable event and fact search.
Entity Fact Sheet for update-aware memory instead of endless fact accumulation.
Markdown / tri-vault for readable memory artifacts.
Hybrid recall for combining keyword signals, structured facts, and LLM-assisted filtering.
Compatibility surfaces for LangChain, LlamaIndex, MCP, and OpenAI-compatible endpoints.

Evidence

MASE has been evaluated across long-context and memory-oriented benchmarks:

Benchmark	Model / Setting	MASE	Baseline	Delta
LV-Eval EN 256k	qwen2.5:7b local	88.71%	4.84%	+84pp
NoLiMa ONLYDirect 32k	qwen2.5:7b local, MASE chunked	60.71%	1.79%	+58.9pp
LongMemEval-S 500	GLM-5 + kimi-k2.5 verifier	61.0% official substring / 80.2% LLM-judge	70.4% substring / 72.4% LLM-judge	+7.8pp judge

LongMemEval is reported with multiple lanes:

61.0% (305/500): official substring-comparable lane.
80.2% (401/500): LLM-judge lane from the same iter2 full_500 run.
84.8% (424/500): post-hoc combined/retry diagnostic, not the public headline.

Detailed benchmark notes live in BENCHMARKS.md and docs/benchmark_claims/.

Quick start

git clone https://github.com/zbl1998-sdjn/MASE-agent-memory.git
cd MASE-agent-memory
pip install -e ".[dev]"
python -m pytest tests/ -q
python mase_cli.py

For benchmark or long-running local work, keep generated memory stores outside the source checkout:

export MASE_RUNS_DIR=../MASE-runs

On Windows PowerShell:

$env:MASE_RUNS_DIR = "..\MASE-runs"

Quality gates

Run these before integration work or pull requests:

python -m pytest -q
python -m mypy
python -m compileall -q -x "(legacy_archive|run_artifacts|dist|build|\.venv|venv|benchmarks/external-benchmarks|__pycache__|\.pytest_cache)" .
npm --prefix frontend run typecheck
npm --prefix frontend test
npm --prefix frontend run build
git diff --check

python -m mypy is intentionally gradual. Current strict coverage is limited to executor.py, planner_agent.py, router.py, model_interface.py, and protocol.py.

Integrations

MASE exposes several integration surfaces:

LangChain BaseChatMemory
LlamaIndex BaseMemory
MCP server for Claude Desktop / Cursor-style clients
OpenAI-compatible endpoint
FastAPI sidecar for local AI agent platforms

Example with LangChain:

from integrations.langchain.mase_memory import MASEMemory

memory = MASEMemory(thread_id="zbl1998::main", top_k=8)
agent_executor.invoke(
    {"input": "What budget did I mention last time?"},
    config={"memory": memory},
)

Current strengths

MASE is strongest when the task requires:

updated user or project facts;
cross-session continuity;
explainable recall;
human-readable memory review;
lightweight local persistence;
benchmarkable memory behavior;
sidecar integration with an agent SaaS or local agent runtime.

Limitations

MASE is still an alpha-stage engineering project. It is not yet a universal retrieval layer.

Known boundaries:

strong synonym and semantic-generalization recall still needs more work;
large document-level semantic retrieval is not the primary path yet;
high-concurrency server-grade deployment requires more runtime hardening;
benchmark claims should be read with the documented lane definitions.

Roadmap

White-box semantic retrieval with write-time tags, read-time expansion, FTS, and LLM filtering.
More server-grade async/runtime hardening.
Broader benchmark triangulation.
More integrations across LangChain, LlamaIndex, MCP, OpenAI-compatible APIs, and agent SaaS platforms.
Memory review workflows before long-term fact/procedure writes.

Architecture boundaries

Stable Core, Compatibility Surface, and Experimental Surface are defined in:

docs/ARCHITECTURE_BOUNDARIES.md
docs/BENCHMARK_ANTI_OVERFIT.md

Contributing

Issues and pull requests are welcome, especially for:

new model backend adapters;
benchmark reruns and independent reports;
integration examples;
real-world long-memory failure cases;
memory governance and audit workflows.

Citation

@software{mase2026,
  author = {zbl1998-sdjn},
  title = {{MASE}: Memory-Augmented Smart Entity — Schema-less SQLite memory for LLM agents},
  year = {2026},
  url = {https://github.com/zbl1998-sdjn/MASE-agent-memory},
  note = {Lifts qwen2.5:7b from 1.79\% to 60.71\% on NoLiMa-32k; 61.0\% official substring / 80.2\% LLM-judge on LongMemEval-S}
}

A note from the developer

MASE started from a simple fear: as AI systems become more powerful, their hidden memory becomes harder to trust.

Instead of treating memory as an invisible vector database, MASE keeps memory small, structured, readable, and correctable. It is built around the belief that reliable agents need transparent memory governance before they need more context.

There is no "single heroic model" here. MASE is a lightweight system where Router, Notetaker, Planner, Action, Executor, SQLite, and Markdown each do a small, inspectable job.

If you believe agent memory should be auditable by default, welcome to MASE.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples		examples
frontend		frontend
integrations		integrations
legacy_archive		legacy_archive
mase_tools		mase_tools
scripts		scripts
src/mase		src/mase
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
7DAY_SPRINT_PLAN.md		7DAY_SPRINT_PLAN.md
7DAY_SPRINT_PLAN_大白话版.md		7DAY_SPRINT_PLAN_大白话版.md
BENCHMARKS.md		BENCHMARKS.md
CHANGELOG.md		CHANGELOG.md
CODE_MAP.md		CODE_MAP.md
CODE_MAP_大白话版.md		CODE_MAP_大白话版.md
DAILY_CHECKLIST.md		DAILY_CHECKLIST.md
DAILY_CHECKLIST_大白话版.md		DAILY_CHECKLIST_大白话版.md
DECISIONS.md		DECISIONS.md
Dockerfile		Dockerfile
INTERVIEW_CHEATSHEET.md		INTERVIEW_CHEATSHEET.md
INTERVIEW_CHEATSHEET_大白话版.md		INTERVIEW_CHEATSHEET_大白话版.md
INTERVIEW_SCRIPT.md		INTERVIEW_SCRIPT.md
LEGACY_SHIMS.md		LEGACY_SHIMS.md
LICENSE		LICENSE
MASE_DEEP_DIVE.md		MASE_DEEP_DIVE.md
MASE可观测平台README.md		MASE可观测平台README.md
README.md		README.md
RELEASE_NOTES_V2.md		RELEASE_NOTES_V2.md
config.cloud.example.json		config.cloud.example.json
config.dual_gpu.json		config.dual_gpu.json
config.json		config.json
config.lme_glm5.json		config.lme_glm5.json
config.nolima.json		config.nolima.json
config.profiles.json		config.profiles.json
docker-compose.yml		docker-compose.yml
event_bus.py		event_bus.py
event_versioning.py		event_versioning.py
executor.py		executor.py
langgraph_orchestrator.py		langgraph_orchestrator.py
mase_cli.py		mase_cli.py
memory_heat.py		memory_heat.py
memory_reflection.py		memory_reflection.py
model_interface.py		model_interface.py
notetaker.py		notetaker.py
notetaker_agent.py		notetaker_agent.py
orchestrator.py		orchestrator.py
planner.py		planner.py
planner_agent.py		planner_agent.py
protocol.py		protocol.py
pyproject.toml		pyproject.toml
reasoning_engine.py		reasoning_engine.py
router.py		router.py
temporal_parser.py		temporal_parser.py
tools_analysis.json		tools_analysis.json
topic_threads.py		topic_threads.py
v2测试.md		v2测试.md
启动MASE.cmd		启动MASE.cmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MASE

What is MASE?

Why it matters

Architecture

Evidence

Quick start

Quality gates

Integrations

Current strengths

Limitations

Roadmap

Architecture boundaries

Contributing

Citation

A note from the developer

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MASE

What is MASE?

Why it matters

Architecture

Evidence

Quick start

Quality gates

Integrations

Current strengths

Limitations

Roadmap

Architecture boundaries

Contributing

Citation

A note from the developer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages