Long-term memory staleness: detecting contradictions between promoted memories as facts change

## Problem

When agents run continuously over weeks, facts change. A user switches projects, preferences evolve, relationships shift. Each session may promote accurate working-memory observations into long-term memory — but the system currently has no mechanism to detect when newer promoted memories contradict older ones.

This creates a subtle but serious failure mode: semantic search returns stale memories alongside current ones, and the agent has no way to distinguish which is authoritative.

## Production Data

Running an OpenClaw agent with persistent memory files (similar architecture to your two-tier model — session-scoped working memory + curated long-term storage) for 28+ consecutive days at 30-minute session intervals, we measured:

- **7% divergence** between self-reported memory accuracy and external verification by day 28
- **Category drift**: Topics that start coherent (e.g., "project status") gradually absorb adjacent items until the category no longer means what it originally did
- **Contradiction accumulation**: An average of 3-4 contradictory facts co-existing in long-term memory by week 3 (e.g., "User prefers React" from week 1 coexisting with "User switched to Svelte" from week 2)
- **Recency paradox**: Newer memories are more accurate, but semantic similarity search doesn't prefer them — it returns the better-embedded older memory

## Specific Gap in Current Architecture

The `MemoryPromotionAndDeduplication` story (#188) covers duplicate detection, but deduplication alone doesn't handle the harder case: **two memories that aren't duplicates but are contradictory**. They have different text, different embeddings, and different timestamps — but they can't both be true.

Examples:
- `"User is building a Rust CLI tool (deadline March 15)"` promoted Feb 28
- `"User finished the CLI tool and moved to a web dashboard"` promoted March 10
- Both are valid memories. Neither is a "duplicate." But returning both in a search for "what is the user working on?" produces confused context.

## Proposed Approach

A lightweight contradiction detection layer that runs during or after memory promotion:

1. **Temporal coherence check**: When promoting a new memory, query long-term memory for semantically similar items. If similarity > threshold AND timestamps differ by > N days, flag for review.

2. **Supersession metadata**: Add an optional `supersedes` field to long-term memories. When a newer memory contradicts an older one, link them. Search can then prefer the newest in a supersession chain.

3. **Memory health endpoint**: A `/memory/health` or `/memory/coherence` endpoint that reports:
   - Total long-term memories
   - Estimated contradiction count (via periodic semantic clustering)
   - Staleness distribution (memories by age bucket)
   - Supersession chain lengths

4. **Configurable resolution strategy**: 
   - `keep_latest` — automatically archive superseded memories
   - `keep_all_tagged` — keep both but tag the older as `superseded`
   - `manual` — flag contradictions for the application to resolve

## Why This Matters

For short-lived agents or single-session use, this isn't an issue. But for agents that run for weeks/months with evolving context — which is increasingly the production pattern — memory staleness silently degrades agent quality. The agent appears to "know" things that are no longer true, and there's no signal to the application that its memory is internally inconsistent.

## Related

- #188 (Memory Promotion And Deduplication) — handles duplicates but not contradictions
- #187 (Long Conversation Memory) — handles session-scoped context but not cross-session temporal evolution

Happy to share our measurement methodology and 28-day dataset if useful for benchmarking contradiction detection approaches.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long-term memory staleness: detecting contradictions between promoted memories as facts change #196

Problem

Production Data

Specific Gap in Current Architecture

Proposed Approach

Why This Matters

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Long-term memory staleness: detecting contradictions between promoted memories as facts change #196

Description

Problem

Production Data

Specific Gap in Current Architecture

Proposed Approach

Why This Matters

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions