Skip to content

fix(channels/orchestrator): keep system prompt byte-stable for prefix caching (#6360)#6630

Closed
yijunyu wants to merge 2 commits into
zeroclaw-labs:masterfrom
yijunyu:fix/telegram-prompt-cache-6360
Closed

fix(channels/orchestrator): keep system prompt byte-stable for prefix caching (#6360)#6630
yijunyu wants to merge 2 commits into
zeroclaw-labs:masterfrom
yijunyu:fix/telegram-prompt-cache-6360

Conversation

@yijunyu

@yijunyu yijunyu commented May 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #6360. On Telegram (and every other channel that flows through process_channel_message), prompt caching never hits β€” the upstream LLM logs forcing full prompt re-processing due to lack of cache data on every turn, reprocessing ~12k system-prompt tokens for each message. CLI works fine. The reporter [@edgarkech] localized the root cause precisely in their comment:

build_channel_system_prompt in crates/zeroclaw-channels/src/orchestrator/mod.rs:660-710 injected two per-turn pieces into the cached system prompt:

  1. A ## Current Date & Time section refreshed to second precision on every call.
  2. A Channel context: …reply_target=…sender=… paragraph that varies per turn (and per sender in groups).

Either alone is enough to invalidate every provider-side cached prefix (llama.cpp slot.prompt_match, Anthropic/OpenAI cache breakpoints). Both together guarantee a 0 % cache hit. CLI does not append these fields to the system prompt, which is why it never showed the regression.

Changes (single file: crates/zeroclaw-channels/src/orchestrator/mod.rs)

  1. build_channel_system_prompt is now byte-stable. Signature reduced from (base, channel, reply_target, sender) to (base, channel). The function now embeds only base_prompt and channel_delivery_instructions(channel_name) β€” no datetime refresh, no reply_target/sender block. A docstring pins the contract for future contributors.
  2. New helper build_channel_turn_context_preamble(channel, reply_target, sender) carrying the same dynamic information that used to live on the system prompt β€” datetime to second precision, channel=, reply_target=, sender=, plus the existing cron_add delivery= guidance. Returns an empty string when reply_target is empty (mirrors the original guard).
  3. Call site at process_channel_message (line ~3024): system prompt is built once with the simpler signature; the preamble is prepended to the last user turn of the outgoing history only. The cached conversation copy stored a few lines earlier (append_sender_turn(…, ChatMessage::user(&msg.content))) is not mutated, so subsequent turns don't accumulate stacked preambles. A starts_with("[turn-context]") guard makes the patch idempotent against any future replay scenarios.

The user message that hits the LLM looks like:

[turn-context] time=2026-05-13 12:34:56 (UTC) channel=telegram reply_target=chat:42 sender=alice. The sender field is the platform-specific user ID of the person who sent this message. Use it to distinguish between different users. When scheduling delayed messages or reminders via cron_add for this conversation, use delivery={"mode":"announce","channel":"telegram","to":"chat:42"} so the message reaches the user.

<actual user content>

The system prefix above it is now identical across every turn of the session, so the LLM's prefix cache hits.

Validation Evidence

  • New helper-level regression tests:
    • build_channel_system_prompt_is_byte_stable_across_calls β€” two back-to-back calls 5 ms apart must produce byte-identical output. Under the pre-fix code, the seconds field of ## Current Date & Time would differ.
    • build_channel_system_prompt_excludes_per_turn_context β€” pins that Channel context: / sender= / reply_target= never reach the system prompt again.
    • build_channel_system_prompt_excludes_per_second_datetime β€” pins that ## Current Date & Time is no longer added at the channel layer.
    • build_channel_turn_context_preamble_carries_dynamic_fields β€” verifies the preamble carries channel=, reply_target=, sender= and starts with the [turn-context] marker used by the idempotency guard.
    • build_channel_turn_context_preamble_empty_when_reply_target_empty β€” CLI-style calls (empty reply_target) get no preamble at all, mirroring CLI behavior which has always worked with prompt caching.
  • New end-to-end regression test:
    • process_channel_message_telegram_system_prompt_is_byte_stable_across_turns β€” drives process_channel_message twice for the same sender 1.1 s apart and asserts calls[0][0].1 == calls[1][0].1 for the system role. The 1.1 s sleep crosses a second boundary on purpose β€” under the pre-fix code, the second field of ## Current Date & Time alone would have flipped this assertion.
  • Updated process_channel_message_enriches_current_turn_without_persisting_context to pin the new contract: the user turn carries [turn-context] … reply_target=chat-ctx, ends in \n\nhello, and the cached history entry remains the clean "hello".
  • cargo test -p zeroclaw-channels --lib β€” 992 passed; 0 failed.
  • cargo fmt --all -- --check β€” clean.

Security & Privacy Impact

  • No change in what data is sent on the wire β€” same fields, same values. The change is purely where in the conversation envelope those fields appear (system β†’ user).
  • No new network paths, headers, or persistence.
  • The dynamic preamble is no longer cached on the provider side either (which is the entire point); it is fully re-tokenized each turn, identical to the prior behavior at a tokens-per-turn level for that specific paragraph.

Compatibility

  • No API change: build_channel_system_prompt is private; signature change is internal.
  • No format change observable to the LLM beyond the relocation: every byte of the prior system-prompt suffix now appears at the head of the user message instead. Models trained on pre-2025 chat conventions handle both orientations.
  • Channels that did not previously hit build_channel_system_prompt are unaffected.
  • Conversation cache persistence is unchanged β€” the stored user turn is still the raw msg.content.

Rollback

Single commit, single file. Revert fix(channels/orchestrator): keep system prompt byte-stable for prefix caching (#6360) to restore the prior (uncacheable) behavior.

… caching (zeroclaw-labs#6360)

build_channel_system_prompt embedded two per-turn pieces into the cached
system prompt: a ## Current Date & Time section refreshed to second precision
and a Channel context: paragraph carrying reply_target / sender. Every reply
mutated the system prefix, so provider prompt caches (llama.cpp prefix cache,
Anthropic/OpenAI cached prefixes) missed unconditionally β€” the reporter saw
'forcing full prompt re-processing due to lack of cache data' and ~12k tokens
reprocessed every turn. CLI did not exhibit the bug because it never appended
these fields to the system prompt.

Move both pieces to a per-turn preamble that rides on the **user message** of
the current turn instead. The system prompt now stays byte-stable for the
lifetime of a session, so prefix caches hit. The cached conversation copy
stored at append_sender_turn time is left untouched β€” only the outgoing wire
copy carries the [turn-context] marker.

New tests:
- build_channel_system_prompt_is_byte_stable_across_calls (helper-level)
- build_channel_system_prompt_excludes_per_turn_context / _per_second_datetime
- build_channel_turn_context_preamble_carries_dynamic_fields / _empty_when_reply_target_empty
- process_channel_message_telegram_system_prompt_is_byte_stable_across_turns
  (end-to-end: two consecutive process_channel_message calls 1.1 s apart must
  produce byte-identical system prompts β€” under the pre-fix behavior the
  ## Current Date & Time second field alone would have mutated)

Inverted the existing process_channel_message_enriches_current_turn_without_persisting_context
assertion to pin the new contract (user message carries the turn-context
preamble; cached history copy stays clean).
@github-actions github-actions Bot added the channel Auto scope: src/channels/** changed. label May 13, 2026
@singlerider singlerider requested a review from tidux May 13, 2026 13:46
@Audacity88 Audacity88 added bug Something isn't working size: M Auto size: 251-500 non-doc changed lines. risk: medium Auto risk: src/** or dependency/config changes. labels May 13, 2026
…:collapsible_if (zeroclaw-labs#6360)

CI Lint failed with clippy::collapsible_if on the turn-context preamble
injection. Fold the !turn_preamble.is_empty() guard into the .filter()
predicate of the existing find chain so a single if let suffices.

Behavior unchanged; all 992 zeroclaw-channels tests still pass.

@Audacity88 Audacity88 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the current public PR state, linked issue #6360, and the current diff at 1e0ae6f. CI is green, and the cache-stability direction looks right. I did not run local cargo tests for this review.

🟒 What looks good β€” Channel context no longer churns the system prompt

The main shape is the right one for #6360. build_channel_system_prompt now keeps only the base prompt and static channel delivery instructions in the system prompt, while the per-turn channel data moves out of that stable prefix. The new tests also cover the important normal case: the outgoing provider turn gets the context, but the cached conversation history stays clean.

πŸ”΄ Blocking β€” User text can suppress the trusted turn context

The new preamble is now the authoritative source for the current reply_target, sender, and the cron_add delivery hint. But the injection path skips adding that runtime-built preamble whenever the last user message already starts with [turn-context]:

.filter(|m| !turn_preamble.is_empty() && !m.content.starts_with("[turn-context]"))

That makes the trusted context user-suppressible. A user can send a message that begins with [turn-context] ..., and then the outgoing provider payload only carries the user-controlled marker instead of the runtime-generated channel context. That is a regression from the old system-prompt placement, where user text could not remove the authoritative sender/reply-target context.

This matters most for reminders because this preamble is where the PR now gives the model the current cron_add delivery hint. If the runtime preamble is skipped, the model gets no trusted delivery guidance for that turn.

The smallest fix is probably to stop treating raw user content as proof that the trusted preamble is already present. Since cached history is intentionally stored without [turn-context], the current outgoing turn can always receive the runtime preamble. Please also add a regression test where the inbound user message starts with [turn-context] and the provider payload still receives the real runtime preamble before the user text.

@Audacity88 Audacity88 added the needs-author-action Author action required before merge label May 20, 2026
@Audacity88 Audacity88 added the stale-candidate Dormant PR or issue; candidate for closing label Jun 8, 2026
@Audacity88

Copy link
Copy Markdown
Collaborator

@yijunyu this PR is still waiting on the May 13 requested changes for the prefix-caching path, and I don't see an author update after that review. GitHub is not currently reporting a clean mergeability result for the branch, so I'm moving it out of active review.

If you want to continue, please push a current branch addressing the existing review feedback, or reply here by June 16, 2026. If there is no update, we'll close this PR as stale and ask for a fresh narrow PR against current master.

@tidux tidux left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the current head 1e0ae6f, linked issue #6360, @Audacity88's existing review, and the current master context around crates/zeroclaw-channels/src/orchestrator/mod.rs. I also reran the local checks after the first test attempt got interrupted: cargo fmt --all -- --check, the targeted helper/process tests, cargo test -p zeroclaw-channels --lib --quiet, and cargo clippy -p zeroclaw-channels --lib -- -D warnings all pass on this head.

The direction is right β€” moving volatile channel metadata out of the cached system prefix is the right fix for #6360 β€” but I can't sign off on this version yet. @Audacity88's trusted-context blocker is still present, and there is one more cache-stability gap that means the system prompt can still vary across turns when memory recall is enabled.

πŸ”΄ Blocking β€” Memory recall still mutates the cached system prompt

The PR removes the timestamp and channel tuple from build_channel_system_prompt, but process_channel_message still builds per-turn memory context from the current inbound message and appends it to the same system message:

  • crates/zeroclaw-channels/src/orchestrator/mod.rs:2984-3018 recalls sender/group memory using the current msg.content.
  • crates/zeroclaw-channels/src/orchestrator/mod.rs:3029-3030 appends the resulting memory_context directly onto system_prompt.

That means the provider-visible system message is still not byte-stable whenever memory recall returns different entries for different turns. This is exactly the kind of dynamic per-turn data that the cacheable prefix must not contain. The current stability test does not catch it because process_channel_message_telegram_system_prompt_is_byte_stable_across_turns uses NoopMemory; meanwhile process_channel_message_enriches_current_turn_without_persisting_context explicitly pins memory context in the system prompt.

The CLI path already treats recalled memory as per-turn user context (crates/zeroclaw-runtime/src/agent/loop_.rs:2620-2643), so I think the channel path should follow the same shape: keep memory out of the system prompt and put it in the non-cached current user turn/preamble. Please add an end-to-end regression test where memory recall returns different relevant entries across two channel turns and assert that the system message stays byte-identical while the current user turn still receives the recalled memory/context.

πŸ”΄ Blocking β€” User text can suppress the trusted turn context

Carrying forward @Audacity88's active block: the injection path still treats raw user text that starts with [turn-context] as proof that the trusted runtime preamble is already present:

crates/zeroclaw-channels/src/orchestrator/mod.rs:3042-3048

Because the cached history is intentionally stored without the runtime-generated preamble, the current outgoing turn can always receive the runtime preamble. With the current guard, a user can start their message with [turn-context] and suppress the authoritative reply_target, sender, and cron_add delivery hint for that turn. That is a trust-boundary regression from the old system-prompt placement.

Please remove the raw-content guard (or otherwise make it impossible for user-controlled text to masquerade as the runtime preamble) and add the regression test @Audacity88 requested: inbound content beginning with [turn-context] should still produce a provider payload whose last user turn begins with the runtime-generated preamble, followed by the user's original text.

🟑 Warning β€” Rebase onto current master before the next review pass

This branch is now 412 commits behind current upstream/master and GitHub reports it as CONFLICTING / DIRTY; git merge-tree --write-tree HEAD upstream/master shows a content conflict in crates/zeroclaw-channels/src/orchestrator/mod.rs.

That rebase is not just mechanical context churn. Current master has added channel-context contracts this May 13 branch does not know about, including message_id / reaction-tool guidance, the webhook delivery.thread_id special case, self-addressed mention handling for aliased channels, and calibration/date-section changes around the channel prompt. When you port the cache-stability fix forward, please preserve those current-master contracts while moving only the volatile per-turn pieces out of the cached prefix.

🟒 What looks good β€” The core cache-fix direction and local checks are solid

The main architectural direction is the right one for #6360: the stable channel delivery instructions belong in the system prompt, while reply_target, sender, delivery hints, and wall-clock time belong in the current turn rather than in the provider-cached prefix. The cached conversation copy staying clean is also the right invariant β€” no stacked preambles across turns.

Local verification on this head:

  • cargo fmt --all -- --check β€” passed.
  • cargo test -p zeroclaw-channels --lib build_channel_system_prompt -- --nocapture β€” 3 passed.
  • cargo test -p zeroclaw-channels --lib process_channel_message_telegram_system_prompt_is_byte_stable_across_turns -- --nocapture β€” passed.
  • cargo test -p zeroclaw-channels --lib process_channel_message_enriches_current_turn_without_persisting_context -- --nocapture β€” passed.
  • cargo test -p zeroclaw-channels --lib --quiet β€” 992 passed in 5.39s.
  • cargo clippy -p zeroclaw-channels --lib -- -D warnings β€” passed.

Once the two blockers are fixed and the branch is rebased onto current master, this should be a much easier re-review: the desired shape is clear, and the existing tests already cover a lot of the non-persistence/cache-prefix behavior.

@singlerider singlerider left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at head 1e0ae6f against current master. @tidux's June 11 review already covers both blockers thoroughly and they are correct; I am confirming the trust-boundary one in source and not adding redundant noise. Not overriding @Audacity88's active changes-requested.

🟒 Cache-fix direction is right

Moving reply_target, sender, delivery hints, and wall-clock time out of the cached system prefix and into the current turn β€” while keeping the stable channel delivery instructions in the system prompt and the cached history copy clean β€” is the correct fix for #6360.

πŸ”΄ Blocking β€” confirmed the trust-boundary regression (user text can suppress the trusted preamble)

Verified at line 3046:

.filter(|m| !turn_preamble.is_empty() && !m.content.starts_with("[turn-context]"))

Raw user content starting with [turn-context] is treated as proof the trusted runtime preamble is already present, so the outgoing provider turn then carries only the user-controlled marker instead of the runtime-built reply_target / sender / cron_add delivery hint. Since cached history is intentionally stored without [turn-context], there is no reason to gate on user content at all β€” the current outgoing turn can always receive the runtime preamble. Drop the raw-content guard and add the regression test @Audacity88 and @tidux asked for: inbound content beginning with [turn-context] must still yield a provider payload whose last user turn starts with the runtime preamble, then the user's text.

πŸ”΄ Blocking β€” agree: memory recall still mutates the cached system prompt

Confirmed the recall path (build_memory_context_for_sessions at line 2984, folded into memory_context at 3012) feeds back onto the system message, so the provider-visible system prompt is not byte-stable when recall differs across turns β€” defeating the prefix-cache goal the PR is for. Follow the CLI shape (loop_.rs): keep recalled memory in the non-cached current user turn, not the system prefix. Add the end-to-end test @tidux described (different recall across two turns, system message byte-identical, current turn still carries the memory).

🟑 Rebase required

This branch is hundreds of commits behind and CONFLICTING on crates/zeroclaw-channels/src/orchestrator/mod.rs (I can confirm that file has moved substantially on current master). The rebase must preserve current-master channel contracts β€” message_id/reaction guidance, the webhook delivery.thread_id case, self-addressed mention handling, and the date/calibration sections β€” while moving only the volatile per-turn pieces out of the cached prefix.

The shape is right; the two trust/stability blockers plus the rebase are the path to green. Leaving @Audacity88's changes-requested in force.

.iter_mut()
.rev()
.find(|m| m.role == "user")
.filter(|m| !turn_preamble.is_empty() && !m.content.starts_with("[turn-context]"))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ”΄ Trust-boundary regression: a user message starting with [turn-context] makes this filter skip the runtime-built preamble, so the outgoing turn carries only user-controlled text instead of the authoritative reply_target/sender/cron_add hint. Cached history never contains [turn-context], so drop the raw-content guard entirely β€” the current outgoing turn can always receive the runtime preamble.

@Audacity88 Audacity88 removed this from the v0.8.1 milestone Jun 15, 2026
@Audacity88

Copy link
Copy Markdown
Collaborator

Hi @yijunyu, closing this PR as stale.

We asked for an updated branch or reply by June 16, 2026, and there has not been a follow-up. The branch is still dirty, and the May 13 changes-requested review still needs author work, so this is not a current review target.

Issue #6360 remains open for the Telegram/channel prompt-caching problem. If you or someone else wants to continue this fix, please open a fresh narrow PR against current master and link back here for context.

@wangmiao0668000666

Copy link
Copy Markdown
Contributor

Hi @yijunyu β€” wanted to flag that I just opened a follow-up PR (#8174) on issue #6360, building on the design you pioneered in this PR. I'm sorry this one got closed before you could land the review feedback; the close-as-stale was on the maintainers, not on your design.

Quick summary of where I landed vs. where you stopped:

Reused from your PR (no credit to me for the shape):

  • The build_channel_turn_context_preamble helper carrying volatile per-turn context into a [turn-context] block on the outgoing user turn.
  • The webhook delivery.thread_id special case preserved verbatim.
  • The "cached conversation history copy stays clean" invariant β€” the preamble only rides on the outgoing LLM call, not on the persisted session log.
  • The end-to-end byte-stability test pattern (1.1s sleep crosses the second boundary).

Fixed in #8174 that the three reviews identified (Audacity88, tidux, singlerider):

  1. The trust-boundary regression you flagged by leaving the m.content.starts_with("[turn-context]") guard in place. fix(channels): keep channel system prompt byte-stable for prefix caching (#6360)Β #8174 drops the guard entirely β€” the preamble is unconditionally prepended whenever reply_target is non-empty. New regression test: process_channel_message_user_text_starting_with_turn_context_still_gets_runtime_preamble.
  2. The memory-recall system-prompt churn that tidux and singlerider pointed out β€” memory_context now rides in the same outgoing user turn preamble instead of getting appended to the cached system prefix. New regression test: process_channel_message_memory_recall_difference_keeps_system_byte_identical.
  3. The 6 master-side contracts that drifted between your PR and current master (message_id + reaction-tool guidance, webhook delivery.thread_id, bot_mention, date refresh, calibration note, channel_delivery_instructions) β€” all preserved in fix(channels): keep channel system prompt byte-stable for prefix caching (#6360)Β #8174, each pinned by a helper-level test.

If you want to rebase your approach on top of #8174 (or just want credit on the commit when it lands), happy to coordinate. Either way, thanks for getting the design right β€” the three reviews closed the gap on top of what you built.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working channel Auto scope: src/channels/** changed. needs-author-action Author action required before merge risk: medium Auto risk: src/** or dependency/config changes. size: M Auto size: 251-500 non-doc changed lines. stale-candidate Dormant PR or issue; candidate for closing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Prompt Caching does not work with telegram

6 participants