fix(channels/orchestrator): keep system prompt byte-stable for prefix caching (#6360)#6630
fix(channels/orchestrator): keep system prompt byte-stable for prefix caching (#6360)#6630yijunyu wants to merge 2 commits into
Conversation
β¦ caching (zeroclaw-labs#6360) build_channel_system_prompt embedded two per-turn pieces into the cached system prompt: a ## Current Date & Time section refreshed to second precision and a Channel context: paragraph carrying reply_target / sender. Every reply mutated the system prefix, so provider prompt caches (llama.cpp prefix cache, Anthropic/OpenAI cached prefixes) missed unconditionally β the reporter saw 'forcing full prompt re-processing due to lack of cache data' and ~12k tokens reprocessed every turn. CLI did not exhibit the bug because it never appended these fields to the system prompt. Move both pieces to a per-turn preamble that rides on the **user message** of the current turn instead. The system prompt now stays byte-stable for the lifetime of a session, so prefix caches hit. The cached conversation copy stored at append_sender_turn time is left untouched β only the outgoing wire copy carries the [turn-context] marker. New tests: - build_channel_system_prompt_is_byte_stable_across_calls (helper-level) - build_channel_system_prompt_excludes_per_turn_context / _per_second_datetime - build_channel_turn_context_preamble_carries_dynamic_fields / _empty_when_reply_target_empty - process_channel_message_telegram_system_prompt_is_byte_stable_across_turns (end-to-end: two consecutive process_channel_message calls 1.1 s apart must produce byte-identical system prompts β under the pre-fix behavior the ## Current Date & Time second field alone would have mutated) Inverted the existing process_channel_message_enriches_current_turn_without_persisting_context assertion to pin the new contract (user message carries the turn-context preamble; cached history copy stays clean).
β¦:collapsible_if (zeroclaw-labs#6360) CI Lint failed with clippy::collapsible_if on the turn-context preamble injection. Fold the !turn_preamble.is_empty() guard into the .filter() predicate of the existing find chain so a single if let suffices. Behavior unchanged; all 992 zeroclaw-channels tests still pass.
Audacity88
left a comment
There was a problem hiding this comment.
I checked the current public PR state, linked issue #6360, and the current diff at 1e0ae6f. CI is green, and the cache-stability direction looks right. I did not run local cargo tests for this review.
π’ What looks good β Channel context no longer churns the system prompt
The main shape is the right one for #6360. build_channel_system_prompt now keeps only the base prompt and static channel delivery instructions in the system prompt, while the per-turn channel data moves out of that stable prefix. The new tests also cover the important normal case: the outgoing provider turn gets the context, but the cached conversation history stays clean.
π΄ Blocking β User text can suppress the trusted turn context
The new preamble is now the authoritative source for the current reply_target, sender, and the cron_add delivery hint. But the injection path skips adding that runtime-built preamble whenever the last user message already starts with [turn-context]:
.filter(|m| !turn_preamble.is_empty() && !m.content.starts_with("[turn-context]"))That makes the trusted context user-suppressible. A user can send a message that begins with [turn-context] ..., and then the outgoing provider payload only carries the user-controlled marker instead of the runtime-generated channel context. That is a regression from the old system-prompt placement, where user text could not remove the authoritative sender/reply-target context.
This matters most for reminders because this preamble is where the PR now gives the model the current cron_add delivery hint. If the runtime preamble is skipped, the model gets no trusted delivery guidance for that turn.
The smallest fix is probably to stop treating raw user content as proof that the trusted preamble is already present. Since cached history is intentionally stored without [turn-context], the current outgoing turn can always receive the runtime preamble. Please also add a regression test where the inbound user message starts with [turn-context] and the provider payload still receives the real runtime preamble before the user text.
|
@yijunyu this PR is still waiting on the May 13 requested changes for the prefix-caching path, and I don't see an author update after that review. GitHub is not currently reporting a clean mergeability result for the branch, so I'm moving it out of active review. If you want to continue, please push a current branch addressing the existing review feedback, or reply here by June 16, 2026. If there is no update, we'll close this PR as stale and ask for a fresh narrow PR against current |
tidux
left a comment
There was a problem hiding this comment.
I reviewed the current head 1e0ae6f, linked issue #6360, @Audacity88's existing review, and the current master context around crates/zeroclaw-channels/src/orchestrator/mod.rs. I also reran the local checks after the first test attempt got interrupted: cargo fmt --all -- --check, the targeted helper/process tests, cargo test -p zeroclaw-channels --lib --quiet, and cargo clippy -p zeroclaw-channels --lib -- -D warnings all pass on this head.
The direction is right β moving volatile channel metadata out of the cached system prefix is the right fix for #6360 β but I can't sign off on this version yet. @Audacity88's trusted-context blocker is still present, and there is one more cache-stability gap that means the system prompt can still vary across turns when memory recall is enabled.
π΄ Blocking β Memory recall still mutates the cached system prompt
The PR removes the timestamp and channel tuple from build_channel_system_prompt, but process_channel_message still builds per-turn memory context from the current inbound message and appends it to the same system message:
crates/zeroclaw-channels/src/orchestrator/mod.rs:2984-3018recalls sender/group memory using the currentmsg.content.crates/zeroclaw-channels/src/orchestrator/mod.rs:3029-3030appends the resultingmemory_contextdirectly ontosystem_prompt.
That means the provider-visible system message is still not byte-stable whenever memory recall returns different entries for different turns. This is exactly the kind of dynamic per-turn data that the cacheable prefix must not contain. The current stability test does not catch it because process_channel_message_telegram_system_prompt_is_byte_stable_across_turns uses NoopMemory; meanwhile process_channel_message_enriches_current_turn_without_persisting_context explicitly pins memory context in the system prompt.
The CLI path already treats recalled memory as per-turn user context (crates/zeroclaw-runtime/src/agent/loop_.rs:2620-2643), so I think the channel path should follow the same shape: keep memory out of the system prompt and put it in the non-cached current user turn/preamble. Please add an end-to-end regression test where memory recall returns different relevant entries across two channel turns and assert that the system message stays byte-identical while the current user turn still receives the recalled memory/context.
π΄ Blocking β User text can suppress the trusted turn context
Carrying forward @Audacity88's active block: the injection path still treats raw user text that starts with [turn-context] as proof that the trusted runtime preamble is already present:
crates/zeroclaw-channels/src/orchestrator/mod.rs:3042-3048
Because the cached history is intentionally stored without the runtime-generated preamble, the current outgoing turn can always receive the runtime preamble. With the current guard, a user can start their message with [turn-context] and suppress the authoritative reply_target, sender, and cron_add delivery hint for that turn. That is a trust-boundary regression from the old system-prompt placement.
Please remove the raw-content guard (or otherwise make it impossible for user-controlled text to masquerade as the runtime preamble) and add the regression test @Audacity88 requested: inbound content beginning with [turn-context] should still produce a provider payload whose last user turn begins with the runtime-generated preamble, followed by the user's original text.
π‘ Warning β Rebase onto current master before the next review pass
This branch is now 412 commits behind current upstream/master and GitHub reports it as CONFLICTING / DIRTY; git merge-tree --write-tree HEAD upstream/master shows a content conflict in crates/zeroclaw-channels/src/orchestrator/mod.rs.
That rebase is not just mechanical context churn. Current master has added channel-context contracts this May 13 branch does not know about, including message_id / reaction-tool guidance, the webhook delivery.thread_id special case, self-addressed mention handling for aliased channels, and calibration/date-section changes around the channel prompt. When you port the cache-stability fix forward, please preserve those current-master contracts while moving only the volatile per-turn pieces out of the cached prefix.
π’ What looks good β The core cache-fix direction and local checks are solid
The main architectural direction is the right one for #6360: the stable channel delivery instructions belong in the system prompt, while reply_target, sender, delivery hints, and wall-clock time belong in the current turn rather than in the provider-cached prefix. The cached conversation copy staying clean is also the right invariant β no stacked preambles across turns.
Local verification on this head:
cargo fmt --all -- --checkβ passed.cargo test -p zeroclaw-channels --lib build_channel_system_prompt -- --nocaptureβ 3 passed.cargo test -p zeroclaw-channels --lib process_channel_message_telegram_system_prompt_is_byte_stable_across_turns -- --nocaptureβ passed.cargo test -p zeroclaw-channels --lib process_channel_message_enriches_current_turn_without_persisting_context -- --nocaptureβ passed.cargo test -p zeroclaw-channels --lib --quietβ 992 passed in 5.39s.cargo clippy -p zeroclaw-channels --lib -- -D warningsβ passed.
Once the two blockers are fixed and the branch is rebased onto current master, this should be a much easier re-review: the desired shape is clear, and the existing tests already cover a lot of the non-persistence/cache-prefix behavior.
singlerider
left a comment
There was a problem hiding this comment.
Reviewed at head 1e0ae6f against current master. @tidux's June 11 review already covers both blockers thoroughly and they are correct; I am confirming the trust-boundary one in source and not adding redundant noise. Not overriding @Audacity88's active changes-requested.
π’ Cache-fix direction is right
Moving reply_target, sender, delivery hints, and wall-clock time out of the cached system prefix and into the current turn β while keeping the stable channel delivery instructions in the system prompt and the cached history copy clean β is the correct fix for #6360.
π΄ Blocking β confirmed the trust-boundary regression (user text can suppress the trusted preamble)
Verified at line 3046:
.filter(|m| !turn_preamble.is_empty() && !m.content.starts_with("[turn-context]"))Raw user content starting with [turn-context] is treated as proof the trusted runtime preamble is already present, so the outgoing provider turn then carries only the user-controlled marker instead of the runtime-built reply_target / sender / cron_add delivery hint. Since cached history is intentionally stored without [turn-context], there is no reason to gate on user content at all β the current outgoing turn can always receive the runtime preamble. Drop the raw-content guard and add the regression test @Audacity88 and @tidux asked for: inbound content beginning with [turn-context] must still yield a provider payload whose last user turn starts with the runtime preamble, then the user's text.
π΄ Blocking β agree: memory recall still mutates the cached system prompt
Confirmed the recall path (build_memory_context_for_sessions at line 2984, folded into memory_context at 3012) feeds back onto the system message, so the provider-visible system prompt is not byte-stable when recall differs across turns β defeating the prefix-cache goal the PR is for. Follow the CLI shape (loop_.rs): keep recalled memory in the non-cached current user turn, not the system prefix. Add the end-to-end test @tidux described (different recall across two turns, system message byte-identical, current turn still carries the memory).
π‘ Rebase required
This branch is hundreds of commits behind and CONFLICTING on crates/zeroclaw-channels/src/orchestrator/mod.rs (I can confirm that file has moved substantially on current master). The rebase must preserve current-master channel contracts β message_id/reaction guidance, the webhook delivery.thread_id case, self-addressed mention handling, and the date/calibration sections β while moving only the volatile per-turn pieces out of the cached prefix.
The shape is right; the two trust/stability blockers plus the rebase are the path to green. Leaving @Audacity88's changes-requested in force.
| .iter_mut() | ||
| .rev() | ||
| .find(|m| m.role == "user") | ||
| .filter(|m| !turn_preamble.is_empty() && !m.content.starts_with("[turn-context]")) |
There was a problem hiding this comment.
π΄ Trust-boundary regression: a user message starting with [turn-context] makes this filter skip the runtime-built preamble, so the outgoing turn carries only user-controlled text instead of the authoritative reply_target/sender/cron_add hint. Cached history never contains [turn-context], so drop the raw-content guard entirely β the current outgoing turn can always receive the runtime preamble.
|
Hi @yijunyu, closing this PR as stale. We asked for an updated branch or reply by June 16, 2026, and there has not been a follow-up. The branch is still dirty, and the May 13 changes-requested review still needs author work, so this is not a current review target. Issue #6360 remains open for the Telegram/channel prompt-caching problem. If you or someone else wants to continue this fix, please open a fresh narrow PR against current |
|
Hi @yijunyu β wanted to flag that I just opened a follow-up PR (#8174) on issue #6360, building on the design you pioneered in this PR. I'm sorry this one got closed before you could land the review feedback; the close-as-stale was on the maintainers, not on your design. Quick summary of where I landed vs. where you stopped: Reused from your PR (no credit to me for the shape):
Fixed in #8174 that the three reviews identified (Audacity88, tidux, singlerider):
If you want to rebase your approach on top of #8174 (or just want credit on the commit when it lands), happy to coordinate. Either way, thanks for getting the design right β the three reviews closed the gap on top of what you built. |
Summary
Fixes #6360. On Telegram (and every other channel that flows through
process_channel_message), prompt caching never hits β the upstream LLM logsforcing full prompt re-processing due to lack of cache dataon every turn, reprocessing ~12k system-prompt tokens for each message. CLI works fine. The reporter [@edgarkech] localized the root cause precisely in their comment:build_channel_system_promptincrates/zeroclaw-channels/src/orchestrator/mod.rs:660-710injected two per-turn pieces into the cached system prompt:## Current Date & Timesection refreshed to second precision on every call.Channel context: β¦reply_target=β¦sender=β¦paragraph that varies per turn (and per sender in groups).Either alone is enough to invalidate every provider-side cached prefix (llama.cpp
slot.prompt_match, Anthropic/OpenAI cache breakpoints). Both together guarantee a 0 % cache hit. CLI does not append these fields to the system prompt, which is why it never showed the regression.Changes (single file:
crates/zeroclaw-channels/src/orchestrator/mod.rs)build_channel_system_promptis now byte-stable. Signature reduced from(base, channel, reply_target, sender)to(base, channel). The function now embeds onlybase_promptandchannel_delivery_instructions(channel_name)β no datetime refresh, no reply_target/sender block. A docstring pins the contract for future contributors.build_channel_turn_context_preamble(channel, reply_target, sender)carrying the same dynamic information that used to live on the system prompt β datetime to second precision,channel=,reply_target=,sender=, plus the existingcron_add delivery=guidance. Returns an empty string whenreply_targetis empty (mirrors the original guard).process_channel_message(line ~3024): system prompt is built once with the simpler signature; the preamble is prepended to the lastuserturn of the outgoinghistoryonly. The cached conversation copy stored a few lines earlier (append_sender_turn(β¦, ChatMessage::user(&msg.content))) is not mutated, so subsequent turns don't accumulate stacked preambles. Astarts_with("[turn-context]")guard makes the patch idempotent against any future replay scenarios.The user message that hits the LLM looks like:
The system prefix above it is now identical across every turn of the session, so the LLM's prefix cache hits.
Validation Evidence
build_channel_system_prompt_is_byte_stable_across_callsβ two back-to-back calls 5 ms apart must produce byte-identical output. Under the pre-fix code, the seconds field of## Current Date & Timewould differ.build_channel_system_prompt_excludes_per_turn_contextβ pins thatChannel context:/sender=/reply_target=never reach the system prompt again.build_channel_system_prompt_excludes_per_second_datetimeβ pins that## Current Date & Timeis no longer added at the channel layer.build_channel_turn_context_preamble_carries_dynamic_fieldsβ verifies the preamble carrieschannel=,reply_target=,sender=and starts with the[turn-context]marker used by the idempotency guard.build_channel_turn_context_preamble_empty_when_reply_target_emptyβ CLI-style calls (emptyreply_target) get no preamble at all, mirroring CLI behavior which has always worked with prompt caching.process_channel_message_telegram_system_prompt_is_byte_stable_across_turnsβ drivesprocess_channel_messagetwice for the same sender 1.1 s apart and assertscalls[0][0].1 == calls[1][0].1for the system role. The 1.1 s sleep crosses a second boundary on purpose β under the pre-fix code, the second field of## Current Date & Timealone would have flipped this assertion.process_channel_message_enriches_current_turn_without_persisting_contextto pin the new contract: the user turn carries[turn-context] β¦ reply_target=chat-ctx, ends in\n\nhello, and the cached history entry remains the clean"hello".cargo test -p zeroclaw-channels --libβ 992 passed; 0 failed.cargo fmt --all -- --checkβ clean.Security & Privacy Impact
Compatibility
build_channel_system_promptis private; signature change is internal.build_channel_system_promptare unaffected.msg.content.Rollback
Single commit, single file. Revert
fix(channels/orchestrator): keep system prompt byte-stable for prefix caching (#6360)to restore the prior (uncacheable) behavior.