refactor(history): rip out history pruning/compression, redo trim as one whole-turn function with a visible RPC event#8196
Conversation
…ontext loss Two #[ignore] live tests drive real Anthropic personal_code turns over the RPC dispatch path against a throwaway low-budget config (history pruning set on the runtime profile only) and assert the post-redo contract: - live_history_pruning_emits_rpc_event: when history is pruned, a session/update history_trimmed notification must fire so the user sees why context changed. - live_history_pruning_never_silently_drops_tool_result: a tool result the model actually produced must not vanish from its context without that event. Both are RED on this tree. Test 2 reproduces the f84c05d confabulation exactly: turn 1 writes "zephyr" to a file (Written 6 bytes, model confirms), and after a few filler turns the model insists "I never wrote anything" because the prune stripped its own tool result from context, with no signal to the user. Harness additions: - RpcContext::for_live_test: non-test public constructor wiring a temp AcpSessionStore + TuiRegistry for external integration tests. - RpcDispatcher::process_line_for_test: public shim over the transport process_line path so a test can drive full JSON-RPC requests and observe emitted notifications.
Single rule: keep the most recent whole turns that fit the token budget, drop the rest, never cut a turn in half. A turn runs from a real user message to the next, so tool_use/tool_result pairs are dropped atomically and provider pairing (Anthropic) is safe by construction. trim_to_recent_turns returns a TrimResult carrying dropped/kept counts and a trimmed flag; breadcrumb() yields the front marker the caller injects so context loss is never silent. Six unit tests cover under-budget, zero budget, oldest-turn drop with system preservation, pairing atomicity, and the keep-last-turn floor.
New SessionUpdateEvent variant emitted whenever older whole turns are dropped to fit the budget, mirroring the cancel emit path. Serializes as history_trimmed (snake_case tag) carrying session_id, dropped_messages, kept_turns, and reason so the client renders a visible context-cut marker.
Delete the per-iteration token-budget block from preflight_history_maintenance (fast_trim_tool_results + prune_history) and the prune_history call in build_context. These mutated the live in-context history mid-turn, stripping the model's own prior tool results right before the provider request and causing it to confabulate completed work as never done (session f84c05d). preflight_history_maintenance now only removes orphaned tool messages and normalizes system messages. Context trimming becomes reactive and turn-bounded via history_trim. context_token_budget is no longer threaded into the per-iteration path.
Run trim_to_recent_turns once per turn at iteration 0 when a context token budget is set. On a trim, inject the breadcrumb after the system messages so the model sees that earlier turns were cut, log the drop counts (attributed), and emit TurnEvent::HistoryTrimmed. Map the new event to SessionUpdateEvent::HistoryTrimmed on the ACP/RPC path (dispatch + acp_server session/update), and to a history_trimmed frame on the gateway WebSocket, so every transport surfaces the cut. No path drops context silently anymore.
Carry tokens_before/tokens_after on TrimResult so the trim record can show the real reduction, not just a turn count. The INFO record now emits dropped_messages, dropped_turns, kept_turns, budget_tokens, tokens_before, tokens_after, tokens_reclaimed, and budget_headroom, with a one-line human message spelling out the token math and remaining headroom under budget. The record stays inside the model/model_provider attribution span, so every trim is fully attributed and never silent. Add unit coverage: token_accounting_is_populated_and_coherent proves a real reduction on a trimmed run, untouched_reports_equal_before_after proves the no-op path reports equal before/after.
On a provider context-window-exceeded error, the interactive loop now recovers via the same trim_to_recent_turns + breadcrumb path as the reactive turn-boundary trim, then retries. One code path, not two: no summarization, no history splicing, no synthetic placeholders. If only the most recent turn remains and it still overflows (option a: never nuke history to the breadcrumb), emit a span-attributed WARN and let the error surface instead of looping. Both recovery records run inside the model/model_provider attribution span. Add trim_record_carries_model_attribution: drives the real run_tool_call_loop over budget and asserts the emitted trim record carries model + model_provider, proving the attribution span is live on the production path, not just asserted.
…ntext The channel orchestrator did its own preemptive history surgery on the live in-context turns sent to the model: proactive_trim_turns dropped or shrank old turns by a char budget, and strip_old_tool_context deleted old tool-call/tool-result messages from the cached history. Both mutated provider-visible context silently with no user-visible signal, the same disease as the runtime preemptive prune. The runtime turn-boundary trim (trim_to_recent_turns + HistoryTrimmed event + breadcrumb) is now the single authority for context fitting. Channels stop touching history. Removed: proactive_trim_turns, strip_old_tool_context, the PROACTIVE_CONTEXT_BUDGET_CHARS / PROACTIVE_TRIM_PROTECT_LAST_N consts, the now-orphaned is_tool_call_content/is_named_tool_call_json/ is_native_tool_call_json chain, the dead fast_trim_tool_results import, both call sites, and the associated tests. The keep_tool_context_turns config ident stays declared (no schema ident change before V4); only the strip code path that consumed it is gone. extract_current_turn_tool messages and the tool-context append path are kept.
prune_history was the preemptive, summary-injecting, multi-phase pruner that mutated provider-visible history mid-conversation: collapse tool pairs into synthetic summaries, cap summaries, merge adjacent summaries, drop oldest under a token budget, insert separators. It is the direct cause of the f84c05d confabulation and is fully replaced by the reactive whole-turn trim_to_recent_turns. Removed from history_pruner.rs: prune_history, PruneStats, estimate_tokens, protected_indices, the MAX_SYNTHETIC_TOOL_SUMMARIES cap, the HistoryPrunerConfig re-export, and all ~25 prune_* / phase / synthetic -summary / separator tests plus the now-unused test-only marker imports. Kept intact and still tested (10 orphan-safety tests green): remove_orphaned_tool_messages and its helpers (extract_tool_call_id, extract_assistant_tool_call_ids, assistant_tool_calls_have_immediate_results, assistant_is_unresolved_dispatch, PrunedOrphans). This is the provider pairing safety net (zeroclaw-labs#5743), reactive and content-faithful, not a pruner.
The ContextCompressor ran an LLM mid-conversation to summarize older turns and spliced the synthetic summary back into provider-visible history. That is the other preemptive, content-rewriting context mutation behind the confabulation: the model would see a machine-written summary in place of its own prior work and lose the thread. Removed the entire crates/zeroclaw-runtime/src/agent/context_compressor.rs module, its mod declaration, and the compressor_with_memory_saves_summary integration test. The channel invocation site was already removed in the prior channels refactor. The context_compression config idents stay declared in the schema (no schema ident change before V4); they no longer drive any code path. Context fitting is now solely the reactive whole-turn trim.
Delete fast_trim_tool_results and emergency_history_trim from history.rs. These were the per-tool-result truncation and oldest-message-drop primitives that silently mutated context: fast_trim rewrote tool result bodies in place, emergency_trim removed messages one at a time. Both could shrink a tool result or split a turn out from under the model, which is the confabulation path. Reactive context-overflow recovery in context_recovery.rs and the interactive loop now route through trim_to_recent_turns: drop oldest WHOLE turns until the history fits a budget forced below current size, never splitting a tool_use/tool_result pair, never shrinking a result. Emit the span-attributed sick-log line with before/after token math. Drop the now-dead fast_trim/emergency_trim unit tests in loop_.rs.
Nothing generates the internal pruning markers anymore, so the whole marker apparatus is dead weight. Remove it end to end: - model_provider.rs: drop pruned_tool_exchange_summary, pruned_context_separator, is_pruned_tool_exchange_summary, is_pruned_context_separator, and the three marker const prefixes. - anthropic.rs: drop should_skip_internal_pruning_marker and the convert_messages skip pass. With no markers to filter, outbound conversion is a straight role walk. - acp_server.rs: drop the history-pruner tool_call replay special case in history_notifications_for_message. The new HistoryTrimmed event is the trim-visibility mechanism now. - history_pruner.rs: the orphan sweep no longer keys off collapsed summary markers; it keeps only the unresolved-dispatch case. Delete the now-unused assistant_tool_calls_have_immediate_results helper. Drop the marker-dependent tests in anthropic.rs and acp_server.rs.
Add ResolvedRuntime::effective_context_budget(): when history_pruning.enabled is set with a positive max_tokens, trim at the lower of that floor and the hard context ceiling so an explicit budget trims earlier; otherwise the ceiling is the only trigger. Reuses the existing history_pruning.* idents with no schema rename (V4 holds the rename). collapse_tool_results and keep_recent no longer drive any code path: summarization is gone and the most-recent whole turn is always kept structurally. Wire both run_tool_call_loop dispatch sites in agent.rs to the new budget instead of raw max_context_tokens. Drop the orphaned proactive-trim test helpers (seed_sender_history, cloned_sender_history, history_signature) left behind when proactive_trim_turns and strip_old_tool_context were removed.
Two pairing-safety GREEN tests on trim_to_recent_turns: - trimmed_history_has_no_orphan_tool_calls: builds a multi-turn history with structured tool_use/tool_result JSON (the shape Anthropic and other providers enforce), trims it, then runs the orphan sweep and asserts it removes zero. A whole-turn drop can never split a pair, so the anti-400 net finds nothing. This is the invariant that stops the "tool result's tool id not found" 400. - breadcrumb_inserts_after_leading_system: with multiple leading system messages, the user-visible breadcrumb lands after the last system message and before the first kept turn, never above the system prompt.
New agents/history-management.md disambiguating the terms that get conflated: whole-turn trimming (the only thing that drops history), orphan sweep (pairing-safety net, not size control), system normalization, tool-result capping at collection time, and provider-side truncation. States plainly that context compression/summarization was removed and why. Every named behavior cites the function that implements it (trim_to_recent_turns, remove_orphaned_tool_messages, normalize_system_messages, truncate_tool_result, estimate_history_tokens, effective_context_budget, breadcrumb) and the HistoryTrimmed RPC event contract, so the page tracks the code rather than inventing a model. Documents the legacy history_pruning.* key reuse and the V4 rename. Linked under the Agents section in SUMMARY.md.
Whitespace and import-ordering only: line wrapping in effective_context_budget, anthropic, history_pruner, history_trim, loop_, rpc/context, and the live test imports. No behavior change.
Five live tests against a real provider exercising the exact trim path that produced the f84c05d confabulation: - smoking gun: model stays honest after its tool result is trimmed - history_trimmed event carries real non-zero numbers - kept_turns >= 1 invariant holds across every trim (never nuked to empty) - tool_use/tool_result pairs stay atomic: zero provider 400s under repeated trims of multi-tool turns - recall honesty: no silent-amnesia path on either trimmed or untrimmed turn
The live RPC battery hit real Anthropic on every invocation. Even #[ignore]-gated that is budget we do not want sitting in the tree, so remove both live files entirely. Replace the lost coverage with an offline test that proves the part the live tests actually guarded: the TurnEvent::HistoryTrimmed -> SessionUpdateEvent::HistoryTrimmed wire mapping. It constructs the event and asserts the session/update notification serializes with type=history_trimmed and the dropped_messages/kept_turns/reason fields, no model required. The live methodology and the empirical evidence it produced are folded into the PR body instead of shipping as runnable code.
…uning # Conflicts: # crates/zeroclaw-runtime/src/agent/agent.rs # crates/zeroclaw-runtime/src/agent/loop_.rs
# Conflicts: # crates/zeroclaw-runtime/src/agent/agent.rs
… trim record Close two gaps the zeroclaw-labs#8196 end-user test case exposed. SSE was the odd surface out: /api/events is fed by the observability pipeline, not TurnEvent, so the trim only surfaced there as a generic record. Add ObserverEvent::HistoryTrimmed (non_exhaustive enum, no out-of-tree break) and map it in BroadcastObserver to a first-class {"type":"history_trimmed", dropped_messages, kept_turns, reason} frame, matching the ACP session/update and WS frames. The trim site now emits the observer event alongside the existing record! and TurnEvent. SSE, WS, and ACP now carry the same history_trimmed shape. agent_alias on the trim record is real, not span-dependent guesswork: the record!'s child zeroclaw_scope span sets model/model_provider, and the alias is inherited from the outer agent attribution_span via the LogCaptureLayer leaf-to-root walk. Pin it: the attribution test now wraps the loop in attribution_span!(AgentAttribution("trimtest")) (the production wiring) and asserts agent_alias lands on the trim record. Docs page updated to document all three visibility surfaces (ACP, WS, SSE) sourced from their mapping sites. Tests: gateway sse history_trimmed broadcast, runtime trim attribution incl agent_alias. fmt clean, clippy -D warnings 0, mdbook + markdownlint clean.
Audacity88
left a comment
There was a problem hiding this comment.
I reviewed current head e42e75947e00f1875471e8b75b755eacd187185d against the live PR body, diff, checks, comments/reviews, and the touched runtime/channel/API/gateway/config/docs paths. I also checked the related history-trimming context called out by the PR. The whole-turn trimming direction is much easier to reason about than the deleted multi-layer pruning/compression stack, but I found two blockers before this should merge.
🔴 Blocking — Overflow recovery still drops turns without the promised visible trim signal
The normal turn-boundary path does the full visibility work: after trim_to_recent_turns, it inserts the breadcrumb, sends TurnEvent::HistoryTrimmed, and records ObserverEvent::HistoryTrimmed (crates/zeroclaw-runtime/src/agent/turn/mod.rs:383-443). The provider-overflow recovery path does not use that path. try_recover_context_overflow takes only history, e, and iteration, calls trim_to_recent_turns, assigns *history = result.history, logs, and returns true (crates/zeroclaw-runtime/src/agent/turn/context_recovery.rs:61-100); the caller then immediately continues the loop (crates/zeroclaw-runtime/src/agent/turn/mod.rs:617-619).
That means a real provider context-window error can still remove earlier turns without inserting [earlier turns omitted to fit the context window] into the model-visible history and without emitting the ACP/WS/SSE history_trimmed event. This contradicts the core contract documented in the new history page, which says both preemptive and reactive paths call the trimmer and that trimming is "never silent" (docs/book/src/agents/history-management.md:36-46, 67-86). Please route the recovery trim through the same breadcrumb + HistoryTrimmed emission path, or otherwise make this recovery path non-trimming, and add a regression test that exercises the overflow-retry branch rather than only the turn-boundary trim.
🔴 Blocking — The rollback/config contract does not match the implementation
The PR body says history_pruning.enabled gates trimming, history_pruning.keep_recent is honored as a minimum, and history_pruning.enabled = false disables trimming entirely at runtime. The code implements a different contract. effective_context_budget() returns max_context_tokens when history_pruning.enabled is false, and run_tool_call_loop still invokes trim_to_recent_turns whenever that positive budget is exceeded (crates/zeroclaw-config/src/schema.rs:3274-3284, crates/zeroclaw-runtime/src/agent/turn/mod.rs:383-386). Also, the new docs explicitly say keep_recent no longer drives any code path (docs/book/src/agents/history-management.md:62-64).
So existing configs still load, but their semantics are not what the compatibility and rollback sections promise: disabling history_pruning disables the earlier max_tokens budget floor, not all whole-turn trimming, and the overflow-recovery path has no history_pruning.enabled gate before it trims and retries (crates/zeroclaw-runtime/src/agent/turn/context_recovery.rs:61-100, crates/zeroclaw-runtime/src/agent/turn/mod.rs:617-619). keep_recent is not honored as a minimum. For a high-risk runtime/config PR, the advertised rollback path and compatibility behavior need to be true. Please either change the implementation to match those claims, or update the public contract and rollback section to describe the new semantics precisely, with focused tests pinning the chosen history_pruning.enabled / max_tokens / keep_recent behavior.
🟢 What looks good — Whole-turn trimming is much easier to reason about
The new history_trim::trim_to_recent_turns helper is small and structurally safer than the old stack: it keeps leading system messages, drops only whole turns, keeps at least the newest turn, and the unit tests cover the main pairing-safety invariants. Once the two contract gaps above are closed, this will be much easier to review and maintain.
WareWolf-MoonWall
left a comment
There was a problem hiding this comment.
Review: #8196 — refactor(history): rip out history pruning/compression, redo trim as one-pass
Author: singlerider (Shane Engelman)
Head SHA: e42e75947e00f1875471e8b75b755eacd187185d
Reviewed by: WareWolf-MoonWall
Verdict: --request-changes
Summary
Net-negative XL refactor (+922 / −3,311) replacing a 6-phase pruner, context compressor,
fast_trim_tool_results, emergency_history_trim, and channel-side proactive_trim_turns /
strip_old_tool_context with a single trim_to_recent_turns function in history_trim.rs.
The algorithm drops oldest whole turns to fit a token budget, never splits a
tool_use/tool_result pair, always keeps the current turn, and signals the cut via a
HistoryTrimmed event over ACP, WS, and SSE. The motivating confabulation bug (model
claiming "I never wrote that" when its own tool result was silently trimmed) is correctly
addressed by the structural atomicity guarantee and the breadcrumb injection.
The design is sound. Two blocking issues require changes before merge.
Findings
🔴 Overflow-recovery trim is still silent to the client
crates/zeroclaw-runtime/src/agent/turn/context_recovery.rs — try_recover_context_overflow
calls trim_to_recent_turns and records a record! log, but neither TurnEvent::HistoryTrimmed
nor ObserverEvent::HistoryTrimmed is emitted on this path. event_tx and observer are not
passed to try_recover_context_overflow.
This directly contradicts the stated design goal "trimming is never silent." The 400-recovery
trim fires at iteration ≥ 1, after the preemptive turn-boundary check at iteration == 0 already
ran. A session with a very long single turn can overflow the first provider call and trigger
recovery — that recovery is invisible to ACP, WS, and SSE subscribers.
Required fix: thread event_tx: Option<&mpsc::Sender<TurnEvent>> and observer: &impl Observer
into try_recover_context_overflow, emit TurnEvent::HistoryTrimmed and
ObserverEvent::HistoryTrimmed in the trimmed branch, and add a test that covers the
recovery path's event emission.
🔴 Bare string literals in user-facing output (Fluent rule)
Two distinct bare literals appear in surfaces that reach end users:
1. history_trim.rs — breadcrumb()
pub fn breadcrumb() -> ChatMessage {
ChatMessage::user("[earlier turns omitted to fit the context window]")
}This message is injected into conversation history that streams through ACP
session/update, WS frames, and SSE. It is user-visible text and must use fl!().
2. turn/mod.rs, rpc/dispatch.rs, sse.rs — reason field
reason: "context token budget exceeded".to_string(),This string is delivered to clients in every HistoryTrimmed event on all three transport
surfaces. It must use fl!().
AGENTS.md: "All user-facing output (CLI messages, tool descriptions, onboarding prompts)
must use fl!() / Fluent strings — never bare string literals."
🟡 ObserverEvent::HistoryTrimmed fields are all None at the production call site
turn/mod.rs emits:
observer.record_event(&ObserverEvent::HistoryTrimmed {
dropped_messages: result.dropped_messages,
kept_turns: result.kept_turns,
reason: "context token budget exceeded".to_string(),
channel: None,
agent_alias: None,
turn_id: None,
});The PR description states "SSE, WS, and ACP now all carry the same history_trimmed shape"
including agent_alias. The BroadcastObserver test
(history_trimmed_event_is_broadcast_with_cut_accounting) passes non-None values directly
and asserts them — but in production, all three optional fields are None, so /api/events
subscribers see null for channel, agent_alias, and turn_id. The claim in the follow-up
section that agent_alias is "real, not span-dependent guesswork" applies to the record!()
log (which inherits from attribution_span! via LogCaptureLayer), not to the
ObserverEvent. The SSE contract as described is not fully delivered.
🟡 start == 0 guard in trim_to_recent_turns is unreachable dead code
if start == 0 {
return TrimResult { trimmed: false, … };
}When boundaries.len() >= 2, the loop iterates at least once and always assigns
start = candidate_start > 0 (boundary positions are indices into body, all ≥ 1 for
any non-empty turn). The start == 0 path is structurally unreachable; it survives as
misleading guard. Either add a debug_assert!(start > 0) to document the invariant or
remove the guard entirely after verifying the algorithm proof holds.
🟡 Labels mismatch between PR body and applied labels
The PR description states "Labels: type: refactor, risk: medium, size: L." GitHub shows
risk: high, size: XL. The rollback section is populated (history_pruning.enabled = false
for the flag, log grep for symptoms), which satisfies risk: high requirements, but the body
misrepresents the actual classification. The body should be updated to match the applied labels
before merge so that the audit trail is accurate.
🟡 RpcContext::for_live_test and RpcDispatcher::process_line_for_test are public non-test API with no consumers in-tree
rpc/context.rs adds pub fn for_live_test(...) explicitly without #[cfg(test)] because
"integration tests compile against the public surface." rpc/dispatch.rs adds
pub async fn process_line_for_test. The PR removes the live tests from the tree. These two
methods are now dead public API: they widen the stable surface of RpcContext and
RpcDispatcher with no in-tree consumer. If the live tests are genuinely gone, gate these
behind a live-tests feature or #[cfg(test)]. If they remain for future external test use,
document that explicitly and track it.
🔵 TOOL_RESULTS_PREFIX boundary heuristic is not shared
history_trim.rs defines:
const TOOL_RESULTS_PREFIX: &str = "[Tool results]";and uses it to identify non-turn-boundary user messages. This prefix is also used at other
points in the orchestrator and channel layer. If the prefix ever diverges between
history_trim.rs and the message-building site, the boundary detector will silently
mis-classify [Tool results] messages as turn heads, potentially fragmenting tool exchanges
across what it considers "turns." The constant should live in zeroclaw-providers or
zeroclaw-api and be imported here.
🔵 No test covers the overflow-recovery trim path (try_recover_context_overflow)
The 12 unit tests in history_trim.rs and the attribution test in loop_.rs cover the
iteration == 0 preemptive path. No test exercises a provider-400-triggered recovery so we
can confirm the outcome fields (dropped_turns, tokens_before, tokens_after) are
correctly populated, and — once the blocking issue above is fixed — that the event is
emitted on this path.
✅ Whole-turn atomicity is structural, not swept
The algorithm bounds turns by real user messages, not by heuristic scanning for
tool_use/tool_result pairs. Dropping whole turns cannot orphan a tool result
regardless of how deeply nested the tool exchange is. The test
trimmed_history_has_no_orphan_tool_calls verifies the orphan sweep (remove_orphaned_tool_messages)
finds nothing after a whole-turn trim — proving the structural guarantee empirically.
✅ SSOT check passes
effective_context_budget() is a computed method on ResolvedRuntime; it reads
history_pruning.max_tokens and max_context_tokens from the canonical config struct at
call time, no shadow field. trim_to_recent_turns is defined in exactly one place. The
orphan sweep, system normalization, and token estimation remain in their original modules
with no duplication. ✅
✅ Validation evidence is unusually thorough
The post-merge live battery (5 tests, 126.68 s, real Anthropic personal_code / claude-opus-4-8) empirically validates:
dropped_messagesrange 4–53, always non-zero when event fireskept_turns == 1on every event — option-a invariant holds- Zero provider 400s across 9 write/read/write trims under budget 700 — tool-pair atomicity
- Offline test suite:
cargo test -p zeroclaw-runtime --lib2308 passed, 0 failed; 12
trim/attribution tests green; gateway lib 280 passed. - CI: all 18 checks passing on run
27992097121.
✅ Net-negative scope
−1,297 lines context_compressor.rs (deleted), −951 history_pruner.rs, −430 orchestrator/mod.rs
channel trimmers, −171 anthropic.rs pruned-marker skip filter. The breadcrumb and the
HistoryTrimmed event replace the entire pruned-marker subsystem cleanly.
Compatibility
history_pruning.* and context_compression.* config idents are preserved; existing
configs load without change. history_pruning.enabled = false disables trimming entirely.
Behavior change is visible (no silent drops, client receives HistoryTrimmed events) but not
breaking in the config-compatibility sense.
Rollback
git revert <merge-sha> restores the prior subsystem. history_pruning.enabled = false
kills trimming at runtime. Failure symptoms: absence of History trimmed: in logs during a
known overflow, or any Anthropic 400 mentioning unmatched tool_use/tool_result.
Required before re-review
- Emit
TurnEvent::HistoryTrimmedandObserverEvent::HistoryTrimmedfrom
try_recover_context_overflow(threadevent_tx+observerin, or emit at the call
site inturn/mod.rsaftertry_recover_context_overflowreturnstrue). - Replace both bare string literals (
breadcrumbtext andreasonvalue) withfl!()
Fluent keys. - Address or explicitly defer the
ObserverEventNone-fields discrepancy with a
follow-up issue.
|
Holding off on review until existing blockers are resolved. I will do live testing of this, since I've been bitten by history trimming bugs before. |
…trings Two blockers from review: - try_recover_context_overflow trimmed history on a provider 400 but never emitted TurnEvent::HistoryTrimmed / ObserverEvent::HistoryTrimmed, so the recovery cut was silent to ACP/WS/SSE, contradicting the never-silent design. Thread event_tx and observer into the recovery path and emit both on the trimmed branch, matching the preemptive turn-boundary path. Add tests: recovery emits the event with non-zero dropped_messages and kept_turns >= 1; a non-overflow error neither recovers nor emits. - The breadcrumb text and the history_trimmed reason were bare string literals in user-facing output. Move both to Fluent keys (history-trim-breadcrumb, history-trim-reason-budget) across all five in-tree locales and read them via get_required_cli_string at the production emit sites.
Addressed both blockers: try_recover_context_overflow now threads event_tx + observer and emits TurnEvent::HistoryTrimmed and ObserverEvent::HistoryTrimmed on the trimmed branch, so the provider-400 recovery cut is no longer silent (new tests cover emit-on-recovery and no-emit on a non-overflow error). The breadcrumb text and the history_trimmed reason are now Fluent keys across all five locales. The ObserverEvent None-fields discrepancy and labels mismatch are noted in the body (fields deferred to follow-up; labels corrected to risk:high / size:XL). Re-requesting review.
Audacity88
left a comment
There was a problem hiding this comment.
I re-reviewed current head 6a30688469c3dcb52dfb998fe4206bfbf01fab99 against the live PR body, prior dismissed reviews, linked #8050 context, the current diff, and the runtime/config/gateway/docs paths touched by this update. I did not run local Cargo or live provider/RPC smoke in this review-only pass. The update fixes some of the previous review points, but two contract gaps still block me from clearing this.
✅ Resolved — Overflow recovery now emits the trim event to clients
The previous head trimmed and retried in try_recover_context_overflow() without sending the visible HistoryTrimmed signal. That part is fixed on this head. The recovery helper now accepts event_tx and observer, emits TurnEvent::HistoryTrimmed for ACP/WS, emits ObserverEvent::HistoryTrimmed for SSE, and has a regression for the recovery branch. That resolves the client-visible half of my earlier overflow-recovery blocker.
✅ Resolved — The trim breadcrumb and reason moved to Fluent
@WareWolf-MoonWall called out the bare user-facing strings. Current head adds history-trim-breadcrumb and history-trim-reason-budget across the in-tree runtime locale files and uses those keys at the production emit sites. That resolves the localization blocker.
🔴 Blocking — Overflow recovery still retries without the model-visible breadcrumb
The recovery path still does not match the "never silent" contract for the model itself. On a context-window error, try_recover_context_overflow() calls trim_to_recent_turns(), then assigns *history = result.history and returns true so the loop retries immediately. Unlike the turn-boundary path, it never inserts history_trim::breadcrumb() after the leading system messages before the retry.
That leaves a real gap in the core fix: the user surfaces now receive history_trimmed, but the model on the retried provider call does not see [earlier turns omitted to fit the context window]. The new history-management doc says every trim injects that breadcrumb and uses it to keep the model from fabricating dropped work; the PR summary makes the same claim. Please make the overflow-recovery branch insert the same breadcrumb before retrying, or narrow the public contract if that path is intentionally different. The regression should assert both halves of the recovery contract: the HistoryTrimmed event is emitted and the retried history contains the breadcrumb in the same position as the turn-boundary path.
🔴 Blocking — The config and rollback contract still overclaims the runtime gate
The current PR body still says history_pruning.enabled gates the trim, history_pruning.keep_recent is honored as a minimum, and history_pruning.enabled = false disables trimming entirely at runtime. The current implementation and new docs say something else: effective_context_budget() uses history_pruning.max_tokens only when history_pruning.enabled is true, but otherwise still returns max_context_tokens, so the hard context ceiling can still trigger whole-turn trimming. The docs also say keep_recent no longer drives any code path.
For this high-risk runtime/config refactor, the compatibility and rollback sections need to be precise. Either change the implementation to match the body, or update the body and rollback text to say that disabling history_pruning disables only the earlier explicit max_tokens budget floor while the hard context ceiling and provider-overflow recovery still trim. If keep_recent is intentionally inert until the V4 rename, the body should not say it is honored.
🟡 Warning — The SSE attribution fields remain deferred without a public tracker
The PR body now says ObserverEvent::HistoryTrimmed still emits channel, agent_alias, and turn_id as None at the production call sites and that populating them is deferred. That is an acceptable scope call for this refactor if it is tracked, because /api/events still receives the cut accounting. Please link the follow-up issue or remove the "tracked" wording so this does not become an invisible loose end.
🟢 What looks good — The broad simplification is still the right direction
The net-negative shape remains good: one whole-turn trimmer is easier to audit than the old compressor/pruner/fast-trim/channel-trim stack, and the update did close the client event and localization gaps. Once the retry path also tells the model what changed, and the public config/rollback contract matches the chosen behavior, this should be much closer to reviewable.
The recovery path emitted the client HistoryTrimmed event but never inserted the model-visible breadcrumb before retrying, so the retried provider call did not tell the model earlier turns were dropped. Insert history_trim::breadcrumb() after the leading system messages on the trimmed branch, matching the turn-boundary path, and extend the recovery test to assert the breadcrumb is present in the retried history.
Addressed both blockers: (1) the overflow-recovery path now inserts history_trim::breadcrumb() after the leading system messages before retrying, so the retried provider call tells the model earlier turns were dropped; the recovery test asserts both the HistoryTrimmed event and the breadcrumb in the retried history. (2) The Compatibility/Rollback body now states the precise contract: history_pruning.enabled=false disables only the explicit max_tokens budget floor (the hard max_context_tokens ceiling and provider-overflow recovery still trim), and keep_recent is inert pending the V4 rename. The deferred ObserverEvent fields note no longer claims a tracker. Re-requesting review.
Nillth
left a comment
There was a problem hiding this comment.
I re-reviewed current head c2dc0adf7983b9382e8d5a7450806b48bbc2b9f0 against the PR body, the two prior dismissed review rounds (@Audacity88 x2, @WareWolf-MoonWall), the diff, the green CI matrix, and the touched runtime / api / gateway / config / docs paths. I verified the head tree directly: context_compressor.rs / the 6-phase prune machinery are gone, history_trim.rs is the one new trimmer, and the upstream merge did not resurrect any deleted module. I read the trim algorithm, the overflow-recovery path, the event plumbing across ACP / WS / SSE, the Fluent keys in all five locales, and the new docs page line by line.
I also ran the engine-validation battery against a worktree on this exact head, toolchain pinned to the 1.93.0 the CI uses. The full static tier passed (cargo fmt --check, clippy --all-targets -D warnings with zero warnings, the engine-scoped suites, and the whole-workspace test battery where the new history_trim and context_recovery tests run) and all five live transport legs passed against a real provider: CLI one-shot, delegate sub-agent, RPC socket, WS with mid-turn steering, and ACP deny-with-edit (11 of 11 green). The RPC leg confirmed tool_use and tool_result share one id with outcome=completed, which exercises end-to-end the whole-turn pairing atomicity this change rests on.
This is the right direction and the previous blockers are genuinely closed. Approving.
✅ RESOLVED — Overflow recovery is no longer silent to clients
try_recover_context_overflow now takes event_tx and observer, and on the trimmed branch emits both TurnEvent::HistoryTrimmed (ACP / WS) and ObserverEvent::HistoryTrimmed (SSE), matching the turn-boundary path (crates/zeroclaw-runtime/src/agent/turn/context_recovery.rs). The caller threads event_tx.as_ref() and observer in (turn/mod.rs). recovery_emits_history_trimmed_event_on_trim covers the emit and non_overflow_error_is_not_recovered_and_emits_nothing pins the negative path. This was my predecessors' first blocker and it is closed.
✅ RESOLVED — The retried provider call now tells the model
The recovery branch inserts history_trim::breadcrumb() after the leading system messages before reassigning *history, so the retried call carries the same [earlier turns omitted to fit the context window] marker as the preemptive path (context_recovery.rs, breadcrumb insert before the *history = ... assignment). The regression asserts both halves: the breadcrumb lands in the retried history and the HistoryTrimmed event fires. That was @Audacity88's remaining blocker on 6a3068; commit c2dc0a closes it.
✅ RESOLVED — Trim strings are Fluent
history-trim-breadcrumb and history-trim-reason-budget are present in en / es / fr / ja / zh-CN and read via get_required_cli_string at every production emit site. As a belt-and-suspenders note, get_required_cli_string degrades to {key} plus a logged WARN on a miss rather than panicking, so even a future locale gap cannot take down the trim path.
✅ RESOLVED — The config and rollback contract now matches the code
effective_context_budget() is min(max_context_tokens, history_pruning.max_tokens) only when history_pruning.enabled && max_tokens > 0, else max_context_tokens (crates/zeroclaw-config/src/schema.rs). The body's Compatibility / Rollback sections and the new docs page now both describe exactly that: disabling history_pruning removes only the explicit max_tokens floor, the hard ceiling and the overflow-recovery path still trim whole turns, and keep_recent / collapse_tool_results are inert pending the V4 rename. Body, docs, and implementation are finally consistent. The context_compression.* and history_pruning.* idents stay declared so existing configs still deserialize, and clippy is clean, so nothing needed an #[allow(dead_code)].
🟢 What looks good
The whole-turn trimmer is structurally safe in a way the old stack was not: it keeps leading system messages, drops only whole turns between real user boundaries, always keeps the most recent turn, and converges to a clean unrecoverable error (rather than looping) when only one oversized turn remains. Because tool_use / tool_result pairs live inside a turn, pairing safety is structural, not swept, and trimmed_history_has_no_orphan_tool_calls proves the orphan sweep finds nothing to do after a trim. The event contract is complete and uniform: TurnEvent (api), SessionUpdateEvent (rpc/ACP), the WS frame, and ObserverEvent -> SSE all carry the same history_trimmed shape, and ObserverEvent is #[non_exhaustive] so the new variant does not break out-of-tree observers. Net -3300 lines with green CI across every target is a real maintainability win.
🔵 Non-blocking observations (no change required to merge)
- Capability tradeoff, called out for the team's awareness. Deleting the compressor removes the old "summarize overflowing context into memory before splicing" path; overflowing turns are now dropped with a breadcrumb rather than summarized. The standalone memory subsystem (
memory_loader/memory_strategy) is untouched, so explicit memory recall still works; only the automatic compression-to-memory-on-overflow behavior is gone. That is the explicit intent of this refactor and I agree honesty beats silent lossy summarization, but if summarization-on-overflow is wanted later it now has a clean foundation to sit on rather than the deleted stack. - The breadcrumb is itself a turn boundary.
breadcrumb()is ausermessage not prefixed with[Tool results], sois_turn_boundarycounts it as a turn start on the next trim. This is benign (it sits at the front, so it is the first thing dropped on a subsequent trim, and back-to-back user messages are fine for Anthropic), but over a long session with repeated trims you can accumulate several[earlier turns omitted...]markers before they age out. Worth a comment if it ever looks noisy in practice. trim_to_recent_turnsre-estimates tokens O(n^2) by rebuildingsystem.clone() + body[candidate..]and re-counting each iteration. Fine for real histories; only flagging it as a known shape if trimming ever shows up hot.- SSE attribution deferral is honestly disclosed.
channel/agent_alias/turn_idareNoneat the productionObserverEvent::HistoryTrimmedsites and the body now says this is not yet tracked rather than claiming a ghost issue./api/eventsstill gets the cut accounting. Fine to land as-is; a follow-up to populate them (and a tracking issue) would close the loop.
Verdict: Approve. All blockers from the prior rounds are verifiably resolved against this head, CI is green across the full matrix, and the design is materially simpler and safer than what it replaces.
…uning # Conflicts: # crates/zeroclaw-channels/src/orchestrator/mod.rs # crates/zeroclaw-runtime/src/agent/agent.rs # crates/zeroclaw-runtime/src/agent/context_compressor.rs # crates/zeroclaw-runtime/src/agent/loop_.rs # tests/integration/memory_loop_continuity.rs
…bs#8214 Add gateway_shutdown_tx to the RpcContext::for_live_test constructor. The zeroclaw-labs#8104 daemon-drain field collided with the zeroclaw-labs#8196 live-test constructor during the merge; the auto-merge left for_live_test missing the new field. Scope content_search_basic_match to its temp workspace via an explicit path. The test defaulted the search path to ".", which under the larger full-workspace parallel load searched the live tree and raced the result cap, dropping the temp hello.rs from output. Searching the temp dir is the test's actual intent and matches the sibling cases.
Summary
mastertmp/secret.mdwas never written when it had written it the prior turn. The user had no signal that context was cut, so hallucinations looked like the model "just lying."trim_to_recent_turns: drop oldest whole turns to fit the token budget, never split atool_use/tool_resultpair, always keep the most recent turn. Reactive on overflow plus a single preemptive check at the turn boundary, never mid-tool-loop, never silent.HistoryTrimmedevent that streams over the samesession/updatepipe as turn cancellation (ACP), plus WebSocket and the public API. When context gets cut the session is told, withdropped_messages/kept_turns/reason. A breadcrumb is also injected into history so the model itself knows.history_pruning.*andcontext_compression.*idents stay declared and keep deserializing; only the code that reads them changed. Does NOT alter provider request shaping beyond removing the now-dead pruned-marker skip-filter.type: refactor,risk: high,size: XL(matches the applied GitHub labels).Validation Evidence (required)
cargo fmt --all -- --check cargo clippy --all-targets -- -D warnings cargo test --workspacecargo fmt --all -- --check— clean, exit 0.cargo clippy --all-targets -- -D warnings:0 warnings across all 5 crates,
--all-targets.cargo test --workspace:The single failure is a pre-existing cron log-race flake, not a regression:
cron/store.rsis untouched by this branch (git diff --name-only upstream/master..HEAD | grep cron-> empty), and the test passes 3/3 when run isolated. All history/trim/RPC tests pass.Beyond CI — what did you manually verified: A live RPC battery was run against real Anthropic (
personal_code, claude-opus-4-8) during development to validate the contract end-to-end. These live tests were intentionally NOT committed (they would burn provider budget on every run); the methodology and results are recorded below. The wire contract they guarded is now covered offline byhistory_trimmed_notification(assertsTurnEvent::HistoryTrimmedserializes to asession/updatewithtype=history_trimmedand the right fields).Live battery (5 tests, 256.73s, 0 failed):
history_trimmedevents fired; every one haddropped_messages > 0(4..51) andreasonset — no fired-but-empty signal.kept_turns == 1, zero hadkept_turns == 0. The current turn always survived.tool_use/tool_resultpair 400s Anthropic, so zero errors proves whole-turn atomicity against the live provider.If any command was intentionally skipped, why: live model tests removed from the tree by design (budget); covered offline + evidenced above.
Security & Privacy Impact (required)
NoNoNoNoCompatibility (required)
YesNo— all existinghistory_pruning.*/context_compression.*idents still parse, so existing configs load unchanged. Precise semantics:effective_context_budget()useshistory_pruning.max_tokensas the budget only whenhistory_pruning.enabled = trueandmax_tokens > 0; otherwise it falls back to the hardmax_context_tokensceiling.history_pruning.enabled = falsetherefore disables only the explicitmax_tokensbudget floor, not trimming itself: the hard context ceiling and the provider-overflow recovery still trim whole turns.history_pruning.keep_recentis currently inert (no code path reads it) pending the V4 rename; it is not honored as a minimum.Rollback (required for
risk: high)git revert <merge-sha>— single squash revert restores the prior subsystem.history_pruning.enabled = falsedisables only the explicitmax_tokensbudget floor; the hardmax_context_tokensceiling and provider-overflow recovery still trim. To raise the ceiling, set a largermax_context_tokens.History trimmed:; thehistory_trimmedsession/updateevent should appear on context cuts. Absence of the event during a known overflow, or any Anthropic 400 mentioning unmatchedtool_use/tool_result, indicates a pairing regression.Net-negative refactor. Deleted
context_compressor.rs, the 6-phase machinery inhistory_pruner.rs(kept only the orphan sweep),fast_trim_tool_results/emergency_history_trim, channelproactive_trim_turns/strip_old_tool_context, and the pruned-marker primitives. Addedhistory_trim.rsand a source-sourced docs pagedocs/book/src/agents/history-management.mddisambiguating trim vs compression vs truncation vs orphan sweep.Post-merge re-validation (latest upstream/master merged in)
Merged latest
upstream/master(f218b949901) via merge commit. Upstream restructured the turn engine (ToolLoopflat fields collapsed intoexec: ResolvedAgentExecution { model_access, .. }, newturn/execution.rs). Conflicts resolved by merge, not rebase:agent.rs(2 hunks): kept upstream's nestedparams:structure, dropped the duplicate HEAD blocks, set bothcontext_token_budgetsites toeffective_context_budget().loop_.rs(1 hunk): dropped upstream's re-introducedprune_historycall (the resurrection this branch deletes), kept the excluded-MCP-tools block, converted both production loop sites toeffective_context_budget(), fixed one merged test builder to the nestedexecshape.No conflict markers in tracked source; no deleted subsystem resurrected.
Offline (post-merge): fmt clean; clippy
-D warnings0 all crates;cargo build --workspaceclean;cargo test -p zeroclaw-runtime --lib2308 passed 0 failed; 12 trim/attribution/notification offline tests green.Live RPC (post-merge, real Anthropic
personal_code,claude-opus-4-8): 5 passed, 0 failed, 126.68s. Empirical wire evidence:history_trimmedevents fired across every session,dropped_messages4 -> 53,reasonalwayscontext token budget exceeded.kept_turnswas1on every event, never0— the current turn always survives (option-a invariant, proven against the live provider).tool_use/tool_resultpair was split.Methodology / budget: the live tests drive
RpcDispatcher+SessionStoreagainst a throwaway config whose encryptedpersonal_codekey is decrypted viaSecretStore, withmax_context_tokensset low so trimming triggers within one or two turns. They are restored from git only to run, then removed; no live model tests are tracked or run in CI. The wire contract is covered offline byhistory_trimmed_notification.Follow-up: first-class
history_trimmedSSE event +agent_aliason the trim recordReviewer feedback on the end-user test case found two real gaps. Fixed the code to match (commit
e42e75947e):SSE was the odd surface out.
/api/eventsis fed by the observability pipeline, notTurnEvent, so the trim only surfaced there as a generic record, not a first-classhistory_trimmedframe. AddedObserverEvent::HistoryTrimmed(the enum is#[non_exhaustive], so no out-of-tree observer break) and mapped it inBroadcastObserverto{"type":"history_trimmed", dropped_messages, kept_turns, reason, channel, agent_alias, turn_id}. The trim site now emits the observer event alongside the existingrecord!andTurnEvent. SSE, WS, and ACP now all carry the samehistory_trimmedshape.agent_aliason the trim record. It is inherited from the outer agentattribution_span!via theLogCaptureLayerleaf-to-root span walk; the trim's own child span setsmodel/model_provider. The attribution test now wraps the loop in the productionattribution_span!(AgentAttribution("trimtest"))and assertsagent_aliaslands on the trim record — proving the contract against the real attribution mechanism, no wide call-site refactor.Docs page updated to document all three visibility surfaces (ACP
session/update, gateway WS frame, SSE observability event), each sourced from its mapping site.Offline: fmt clean; clippy
-D warnings0; gateway lib 280 passed (incl.history_trimmed_event_is_broadcast_with_cut_accounting); runtime lib 2308 passed (incl.trim_record_carries_model_attributionwith theagent_aliasassertion); api lib 53 passed; mdbook + markdownlint clean.Live smoke (real Anthropic
personal_code,claude-opus-4-8): re-ran one live trim test through the dispatcher after the change — 1 passed, 75.86s, 7 real trims with truthful non-zerodropped_messages(4 -> 28) andkept_turns:1on every one. The new observer emit ran on each trim with zero panic, zero error. Live test removed from the tree after the run; not in CI.Review follow-ups addressed
try_recover_context_overflowtakesevent_tx+observerand emitsTurnEvent::HistoryTrimmedandObserverEvent::HistoryTrimmedon the trimmed branch, so the provider-400 recovery cut is no longer silent to ACP / WS / SSE. New tests cover the emit on recovery and the no-emit / no-recover path for a non-overflow error.history_trimmedreason are nowhistory-trim-breadcrumb/history-trim-reason-budgetFluent keys across all five in-tree locales, read viaget_required_cli_stringat the production emit sites.history_trim::breadcrumb()after the leading system messages before retrying, so the retried provider call carries the same[earlier turns omitted...]marker as the turn-boundary path. The recovery test asserts both halves: theHistoryTrimmedevent is emitted and the retried history contains the breadcrumb.channel/agent_alias/turn_idareNoneat the productionObserverEvent::HistoryTrimmedcall sites; attribution rides therecord!()log via the span, not theObserverEvent./api/eventsconsumers still receive the cut accounting (dropped_messages,kept_turns,reason). Populating these is left to a follow-up; no tracking issue is filed yet, so this is called out here rather than claimed as tracked.