-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Problem
In Story 1 load testing, working-memory latency grew almost linearly with session size, and degraded further as concurrent
active sessions increased.
This was reproduced with:
- long-term memory disabled
- summarization effectively disabled
- one long-running session that kept growing
- a Locust run ramping up to 25 active sessions doing the same thing
That isolates the core working-memory read/write path. The current behavior does not meet the expected “recent context
stays fast as conversations grow” requirement.
Observed behavior
As sessions grew:
- PUT /v1/working-memory/{session_id} latency increased steadily with transcript size
- GET /v1/working-memory/{session_id} latency also increased steadily
- the effect became much worse under concurrency
This suggests the current implementation is doing work proportional to total session size on each update/read.
Likely cause in the current code
The hot path appears to be full-session based:
- put_working_memory_core() replaces the entire session on every write
- _summarize_working_memory() token-counts the full message list before deciding whether to summarize
- set_working_memory() serializes the full messages array and writes the whole JSON document back to Redis
- get_working_memory() loads the full working-memory document, instantiates all messages, then sorts/slices in memory for
recent_messages_limit
Relevant files:
- agent_memory_server/api.py
- agent_memory_server/working_memory.py
Why this matters
Working-memory operations appear to scale with total conversation length instead of recent context size. Under multi-
session load, that cost compounds quickly.
Proposed fix
Refactor the working-memory path so reads and writes do not require full-session work on every request.
Acceptance criteria
- PUT latency should remain approximately flat as a session grows
- GET with recent_messages_limit should not depend on total transcript length
- concurrent long sessions should degrade gracefully rather than linearly with total stored history