Skip to content

Working-memory PUT/GET latency grows with session length and concurrency #195

@tylerhutcherson

Description

@tylerhutcherson

Problem

In Story 1 load testing, working-memory latency grew almost linearly with session size, and degraded further as concurrent
active sessions increased.

This was reproduced with:

  • long-term memory disabled
  • summarization effectively disabled
  • one long-running session that kept growing
  • a Locust run ramping up to 25 active sessions doing the same thing

That isolates the core working-memory read/write path. The current behavior does not meet the expected “recent context
stays fast as conversations grow” requirement.

Observed behavior

As sessions grew:

  • PUT /v1/working-memory/{session_id} latency increased steadily with transcript size
  • GET /v1/working-memory/{session_id} latency also increased steadily
  • the effect became much worse under concurrency

This suggests the current implementation is doing work proportional to total session size on each update/read.

Likely cause in the current code

The hot path appears to be full-session based:

  • put_working_memory_core() replaces the entire session on every write
  • _summarize_working_memory() token-counts the full message list before deciding whether to summarize
  • set_working_memory() serializes the full messages array and writes the whole JSON document back to Redis
  • get_working_memory() loads the full working-memory document, instantiates all messages, then sorts/slices in memory for
    recent_messages_limit

Relevant files:

  • agent_memory_server/api.py
  • agent_memory_server/working_memory.py

Why this matters

Working-memory operations appear to scale with total conversation length instead of recent context size. Under multi-
session load, that cost compounds quickly.

Proposed fix

Refactor the working-memory path so reads and writes do not require full-session work on every request.

Acceptance criteria

  • PUT latency should remain approximately flat as a session grows
  • GET with recent_messages_limit should not depend on total transcript length
  • concurrent long sessions should degrade gracefully rather than linearly with total stored history

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions