DRAFT: Make LLMSummarizingCondenser token-budget aware (per model max_input_tokens) #912

enyst · 2025-10-26T20:43:10Z

Summary
The current LLMSummarizingCondenser triggers condensation purely by event count (max_size). This causes premature condensations for large-context models (e.g., GPT-5, Gemini), especially on hard tasks where the agent reads many files first. The agent can condense multiple times before it writes code, forgets objectives, and loops back to earlier stages.

What’s happening

Condensation is triggered whenever len(view) > max_size (default 120). This ignores the actual tokenized prompt size and the model’s context window.
In practice, large-context models can continue without condensing because the prompt token count is still within budget. Yet we condense early and lose useful context.

Changes

LLMSummarizingCondenser is now token-budget aware when llm.max_input_tokens is available:
- should_condense computes token usage via LLMConvertibleEvent.events_to_messages + llm.get_token_count and compares it to a budget: max_input_tokens - max_output_tokens - headroom.
- headroom = token_margin_ratio * max_input_tokens (default 0.1) to leave buffer for response and metadata.
- get_condensation uses a binary search to keep as much tail context as fits under the token budget while preserving keep_first at the head.
- If limits are unknown or counting fails, behavior falls back to the original event-count logic. Backward compatibility is preserved.

Why this helps

Reduces premature condensation for large-context models
Preserves more recent, relevant context while maintaining safe headroom
Leaves behavior unchanged when model limits are unknown

Testing

Added a focused unit test for token-budget behavior (mocking token counts)
Existing condenser tests continue to pass

Open questions

Is 10% headroom a sensible default across providers? Should it be model-specific?
Do we want to expose token_margin_ratio widely in config/presets?

Co-authored-by: openhands [email protected]

@enyst can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Base Image	Docs / Tags
golang	`golang:1.21-bookworm`	Link
java	`eclipse-temurin:17-jdk`	Link
python	`nikolaik/python-nodejs:python3.12-nodejs22`	Link

Pull (multi-arch manifest)

docker pull ghcr.io/openhands/agent-server:44dba94-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-44dba94-python \
  ghcr.io/openhands/agent-server:44dba94-python

All tags pushed for this build

ghcr.io/openhands/agent-server:44dba94-golang
ghcr.io/openhands/agent-server:v1.0.0a4_golang_tag_1.21-bookworm_binary
ghcr.io/openhands/agent-server:44dba94-java
ghcr.io/openhands/agent-server:v1.0.0a4_eclipse-temurin_tag_17-jdk_binary
ghcr.io/openhands/agent-server:44dba94-python
ghcr.io/openhands/agent-server:v1.0.0a4_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary

The 44dba94 tag is a multi-arch manifest (amd64/arm64); your client pulls the right arch automatically.

…nput_tokens - Add token-aware should_condense that compares tokenized messages against a budget derived from llm.max_input_tokens, llm.max_output_tokens, and a configurable token_margin_ratio - Choose tail size via binary search to keep as much recent context as fits, falling back to event-count heuristic when limits are unknown - Preserve backward compatibility; default event-count behavior remains when model limits are absent Co-authored-by: openhands <[email protected]>

csmith49 · 2025-10-27T20:14:30Z

openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py

+            if max_input:
+                # Build messages for token counting
+                messages = LLMConvertibleEvent.events_to_messages(view.events)
+                total_tokens = self.llm.get_token_count(messages)


Worth noting the LLM used by the condenser is not necessarily the LLM used by the agent, and condensation is intended to benefit the latter.

csmith49 · 2025-10-27T20:16:19Z

openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py

+
+        # Prefer token-aware check when LLM has context window info and
+        # we can estimate message tokens. Fallback to event-count otherwise.
+        try:


This entire block probably deserves to be pulled out into a function that can be used by any condenser. I don't think there are any others in the SDK that need this info but it'd be good to have for folks extending condensers if it was, e.g., a static method on the base class.

blacksmith-sh · 2025-11-03T13:01:36Z

[Automatic Post]: It has been a while since there was any activity on this PR. @enyst, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

csmith49 reviewed Oct 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DRAFT: Make LLMSummarizingCondenser token-budget aware (per model max_input_tokens) #912

DRAFT: Make LLMSummarizingCondenser token-budget aware (per model max_input_tokens) #912

Uh oh!

enyst commented Oct 26, 2025 •

edited by github-actions bot

Loading

Uh oh!

csmith49 Oct 27, 2025

Uh oh!

csmith49 Oct 27, 2025

Uh oh!

blacksmith-sh bot commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DRAFT: Make LLMSummarizingCondenser token-budget aware (per model max_input_tokens) #912

Are you sure you want to change the base?

DRAFT: Make LLMSummarizingCondenser token-budget aware (per model max_input_tokens) #912

Uh oh!

Conversation

enyst commented Oct 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csmith49 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

csmith49 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

blacksmith-sh bot commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

enyst commented Oct 26, 2025 •

edited by github-actions bot

Loading