Working-memory PUT/GET latency grows with session length and concurrency

## Problem

In Story 1 load testing, working-memory latency grew almost linearly with session size, and degraded further as concurrent
active sessions increased.

This was reproduced with:

- long-term memory disabled
- summarization effectively disabled
- one long-running session that kept growing
- a Locust run ramping up to 25 active sessions doing the same thing

That isolates the core working-memory read/write path. The current behavior does not meet the expected “recent context
stays fast as conversations grow” requirement.

## Observed behavior

As sessions grew:

- PUT /v1/working-memory/{session_id} latency increased steadily with transcript size
- GET /v1/working-memory/{session_id} latency also increased steadily
- the effect became much worse under concurrency

This suggests the current implementation is doing work proportional to total session size on each update/read.

## Likely cause in the current code

The hot path appears to be full-session based:

- put_working_memory_core() replaces the entire session on every write
- _summarize_working_memory() token-counts the full message list before deciding whether to summarize
- set_working_memory() serializes the full messages array and writes the whole JSON document back to Redis
- get_working_memory() loads the full working-memory document, instantiates all messages, then sorts/slices in memory for
recent_messages_limit

Relevant files:

- agent_memory_server/api.py
- agent_memory_server/working_memory.py

## Why this matters

Working-memory operations appear to scale with total conversation length instead of recent context size. Under multi-
session load, that cost compounds quickly.

## Proposed fix

Refactor the working-memory path so reads and writes do not require full-session work on every request.

## Acceptance criteria

- PUT latency should remain approximately flat as a session grows
- GET with recent_messages_limit should not depend on total transcript length
- concurrent long sessions should degrade gracefully rather than linearly with total stored history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working-memory PUT/GET latency grows with session length and concurrency #195

Problem

Observed behavior

Likely cause in the current code

Why this matters

Proposed fix

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Working-memory PUT/GET latency grows with session length and concurrency #195

Description

Problem

Observed behavior

Likely cause in the current code

Why this matters

Proposed fix

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions