Skip to content

fix(loop): gate path-listing tool results from vision routing#7345

Merged
Nillth merged 4 commits into
zeroclaw-labs:masterfrom
NNet-Dev:fix/provider_capability_error
Jun 23, 2026
Merged

fix(loop): gate path-listing tool results from vision routing#7345
Nillth merged 4 commits into
zeroclaw-labs:masterfrom
NNet-Dev:fix/provider_capability_error

Conversation

@Nillth

@Nillth Nillth commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Base branch: master
  • What changed and why:
    • The agent loop decides vision-provider routing by counting [IMAGE:...]
      markers in tool results. Path-listing tools (content_search, glob_search)
      print local image paths, which canonicalize_tool_result_media_markers
      wrapped as [IMAGE:...]. Those are not images presented as visual content,
      but they inflated the count and falsely triggered vision routing, producing
      provider_capability_error capability=vision on a text-only provider.
    • Write-side gate. A provenance-aware
      canonicalize_tool_result_media_markers_for(tool_name, output) is a no-op for
      path-listing tools and canonicalizes everything else (so image_gen /
      file_download paths still become routable markers). It is applied at every
      write site that has the producing tool name: turn::results_collect::collect_tool_results
      (the turn loop's single chokepoint) and both dispatchers' format_results.
    • Read-side gate (this revision, addressing the latest review). Agent::turn
      serializes self.history via NativeToolDispatcher::to_provider_messages
      before the loop runs, and that path re-canonicalized stored
      ConversationMessage::ToolResults provenance-blind (the tool name had been
      dropped), so a search path stored as found: /hit.png was re-promoted to
      [IMAGE:/hit.png]. Fixed by carrying provenance through the persisted shape:
      a #[serde(default)] tool_name on ToolResultMessage, populated in native
      format_results and on ACP session resume, with both read paths now routed
      through the provenance-aware helper. Every other constructor leaves tool_name
      empty and falls through to the blind canonicalizer, preserving the fix(multimodal): normalize image markers across agent and tool historyΒ #6183
      read-side promotion of genuine image paths that carry no recorded provenance.
  • Scope boundary: No change to marker parsing, provider normalization, the
    user-attached-image path, or the channel orchestrator. image_gen/file_download
    still route real images to a vision provider. The added field is additive and
    serde-defaulted (existing persisted sessions deserialize unchanged).
  • Blast radius: Vision-routing decision in the agent loop, reached by every
    transport (Agent::turn / turn_streamed / ACP / gateway WS + RPC) through
    run_tool_call_loop. Default-allow: only pure path-listing tools are suppressed.
  • Linked issue(s): None filed (reviewer-driven). Related fix(multimodal): normalize image markers across agent and tool historyΒ #6183 (the read-side
    canonicalization rationale this preserves). Related fix(vision): scope the no-vision capability error to the latest user imageΒ #8180 (the complementary
    user-image-marker fix: a stale [IMAGE:] from a user message poisoning the
    no-vision capability check; disjoint files, no overlap).
  • Known residual (disclosed): the flat provider-wire replay seed
    (Agent::turn's msg.role == "tool" reconstruction) rebuilds results from a
    {tool_call_id, content} payload with no tool name, so those land with empty
    provenance. Narrow (not the per-turn loop), pre-existing; closing it would mean
    adding tool_name to the JSON sent to the model, which is declined here.
  • Labels: auto-applied by the repo labeler (bug, agent, runtime; risk:
    / size: are automation/maintainer-owned). Snapshot after labeling.

Validation Evidence (required)

Toolchain pinned to CI's 1.93.0. Branch merged current with master (the
validation below is against the merged tree). apps/tauri excluded from the
workspace builds (its gobject-2.0 system lib is absent in this environment;
unrelated to the diff).

  • Commands run and tail output:
    $ cargo fmt --all -- --check
    (clean)
    
    $ cargo check  --workspace --exclude <tauri> --all-targets
    Finished `dev` profile in 2m21s   (exit 0)
    
    $ cargo clippy --workspace --exclude <tauri> --all-targets -- -D warnings
    (exit 0, no warnings)
    
    $ cargo test -p zeroclaw-api
    test result: ok. 58 passed; 0 failed   (+ doctest: 1 passed)
    
    $ cargo test -p zeroclaw-infra acp
    test result: ok. 22 passed; 0 failed
    
    $ cargo test -p zeroclaw-runtime --lib agent::dispatcher
    test result: ok. 15 passed; 0 failed   (incl. 3 new round-trip tests)
    
    $ cargo test -p zeroclaw-runtime --lib agent:: -- --test-threads=1
    test result: ok. 592 passed; 0 failed; 1 ignored
    
  • Beyond CI β€” what did you manually verify? New regression tests round-trip
    the native shape format_results -> to_provider_messages with real on-disk
    PNGs:
    • native_search_path_survives_format_then_to_provider_messages β€” a
      content_search/glob_search result keeps its literal path through the read
      path; no [IMAGE:.
    • native_image_gen_path_still_promotes_through_to_provider_messages β€” a real
      generated image still canonicalizes across the round trip.
    • native_unknown_provenance_still_promotes_on_read β€” the fix(multimodal): normalize image markers across agent and tool historyΒ #6183 contract
      (empty provenance still promotes a genuine image path).
      The existing native_dispatcher_converts_tool_results_to_tool_messages stays
      green. NOT verified: a live end-to-end against a deployed text-only provider
      (covered structurally by the unit tests).
  • If any command was intentionally skipped, why: Web/Python/Go/shell gates
    are N/A (no such files changed). The agent module is run single-threaded
    because turn_cache_hit_emits_agent_end_with_none_tokens flakes on shared
    response-cache contention under full parallelism (passes isolated; text-only
    cache key, unrelated to this change).

Security & Privacy Impact (required)

  • New permissions, capabilities, or file system access scope? No
  • New external network calls? No
  • Secrets / tokens / credentials handling changed? No
  • PII, real identities, or personal data in diff, tests, fixtures, or docs? No
  • If any Yes, describe the risk and mitigation: n/a

Compatibility (required)

  • Backward compatible? Yes β€” behavior fix plus an additive, serde-defaulted
    field; existing persisted sessions deserialize unchanged.
  • Config / env / CLI surface changed? No
  • If No or Yes to either: no upgrade steps. (ToolResultMessage gains a pub
    tool_name field; all in-workspace constructors are updated, and the field
    defaults on deserialize.)

Rollback (required for risk: medium and risk: high)

  • Fast rollback command/path: git revert <merge/commit shas> β€” confined to
    the agent vision-routing path (history.rs, dispatcher.rs, results_collect.rs,
    model_provider.rs, acp_session_store.rs); no schema migration, no config/API
    surface to unwind. The serde-default field is inert if the read-side gate is
    reverted.
  • Feature flags or config toggles: None.
  • Observable failure symptoms: if vision routing regresses, grep agent logs
    for provider_capability_error with capability=vision on text-only providers
    after a content_search/glob_search that listed an image path; or a real
    image_gen result that fails to route to a configured vision provider.

@Nillth Nillth requested a review from Audacity88 as a code owner June 7, 2026 11:04
@github-actions github-actions Bot added agent Auto scope: src/agent/** changed. runtime Auto scope: src/runtime/** changed. labels Jun 7, 2026
@Audacity88 Audacity88 added bug Something isn't working risk: high Auto risk: security/runtime/gateway/tools/workflows. size: S Auto size: 81-250 non-doc changed lines. labels Jun 7, 2026

@Audacity88 Audacity88 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the one-file runtime diff, the live PR metadata, the empty top-level/inline/formal review history, current CI, the agent-loop vision-routing path, multimodal latest-tool-result handling, and provider-routing source excerpts. I am requesting changes because the role-only counter both misses one prompt-mode false-positive path and drops the route that legitimate latest tool-generated images need when the default provider is text-only.

🟒 What looks good β€” The user-image path stays narrow

The user-authored image tests are a useful start, and keeping the actual marker parsing/normalization behavior out of this PR is the right scope. Once the tool-result distinction is made precise, this can stay a small targeted fix.

πŸ”΄ Blocking β€” Prompt-mode tool results still count as user images

run_tool_call_loop still stores fallback tool output as a user-role message when native tool IDs are unavailable:

history.push(ChatMessage::user(format!("[Tool results]\n{tool_results}")));

The new count_user_image_markers() helper filters only on m.role == "user", so prompt-mode tool results with a [Tool results] prefix still count as user images. That means a filesystem tool result that gets wrapped as [IMAGE:/path] can still trigger the same false vision route on providers that are using the prompt tool-result path.

Please exclude prompt-mode tool-result carriers as well, or reuse/share the existing multimodal predicate that already treats role == "user" && content.starts_with("[Tool results]") as tool output. The regression coverage should include a ChatMessage::user("[Tool results]\n... [IMAGE:/path]") case, not only role == "tool".

πŸ”΄ Blocking β€” Latest real tool images no longer trigger vision routing

The existing multimodal contract intentionally treats the latest tool-result image as current multimodal input. prepare_messages_for_provider() normalizes latest role == "tool" and prompt-mode tool-result images, and the existing tests cover that an image_gen-style latest tool result remains image-bearing while stale tool images are stripped.

This patch changes the routing decision to count only user-role messages before provider selection. For a text-only default provider with vision_model_provider configured, a newly generated tool image will no longer cause the loop to switch to the vision provider. The request can then stay on the text provider even though the prepared messages contain image data, which is the route guard this code was meant to prevent.

Please preserve routing for legitimate latest tool-generated images while suppressing filesystem/path echoes. A loop-level regression would make the boundary clear: a latest image_gen result should still attempt the configured vision provider when the default provider lacks vision, while a filesystem/prompt tool-result path echo should not.

Nillth added a commit to NNet-Dev/zeroclaw that referenced this pull request Jun 9, 2026
Addresses the changes-requested on zeroclaw-labs#7345. Replaces the role-only
`count_user_image_markers` approach (which missed prompt-mode `[Tool results]`
carriers and dropped routing for genuinely generated latest images) with a
provenance-gated canonicalization.

Filesystem search/listing tools (content_search, glob_search) print local
image-file paths. `canonicalize_tool_result_media_markers` wrapped those as
`[IMAGE:…]`, and the agent loop counts the current iteration's tool-result
markers (`multimodal::count_image_markers`) when deciding whether to switch
to a vision provider β€” so a path echo falsely triggered vision routing and a
provider-capability error on a text-only provider.

`canonicalize_tool_result_media_markers_for` is now a no-op for path-listing
tools, so their paths never become routable markers, while
image-producing/fetching tools (image_gen, file_download, …) keep
canonicalization. The existing `count_image_markers` routing contract is
restored, so a genuinely generated latest image still routes, and the
prompt-mode carrier is handled because gating happens at the per-tool
canonicalize site (before results are merged).

Regression tests: search-tool path echoes (native + prompt-mode) do not
trigger routing; an image_gen latest result still does; user-attached
images unaffected.
@Nillth Nillth changed the title fix(loop): count image markers only from user messages to prevent false vision-routing triggers fix(loop): gate path-listing tool results from vision routing Jun 9, 2026
@Nillth

Nillth commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks for the careful review β€” both blockers were spot on, and they pushed me off the role-only approach entirely. Reworked fix pushed in ad0388e8c (branch now also up to date with master).

Why the role filter couldn't work. After canonicalize_tool_result_media_markers runs, a filesystem path echo (content_search/glob_search surfacing /path/img.png) and a genuinely generated image (image_gen writing a file and printing its path) are byte-identical [IMAGE:/path] markers. There's no marker-text signal to tell them apart β€” the only discriminator is provenance (which tool produced the result). So I moved the decision to where the tool name still exists.

New approach β€” provenance-gated canonicalization. canonicalize_tool_result_media_markers_for(tool_name, output) is a no-op for path-listing tools (content_search, glob_search) and canonicalizes everything else. It's gated at the single production canonicalize site in run_tool_call_loop, where tool_name is in scope β€” before results are merged into the [Tool results] carrier. The loop counter is reverted to multimodal::count_image_markers.

This resolves both blockers:

It's a denylist (default-allow) rather than an allowlist, deliberately: the failure mode of forgetting to list a tool is "an image doesn't auto-route, user attaches manually" rather than "the capability error returns." Only pure path-listing tools are excluded.

Tests (loop + history): search-tool path echo doesn't route (native and prompt-mode carriers); image_gen latest result still routes; user-attached images unaffected; plus unit coverage of the _for skip/wrap behavior.

One open question: I scoped the denylist to content_search + glob_search as the tools that actually emit existing local image paths. If you'd prefer an explicit allowlist of image-producers, or want other tools added, happy to adjust.

@Audacity88 Audacity88 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-reviewed current head e0b2797 against my earlier CHANGES_REQUESTED review at fd2081d, the author's follow-up comment, the current PR body, the current diff, the agent-loop vision-routing path, the multimodal marker-count contract, the PR's regression-test evidence, the latest GitHub checks, and the public merge-conflict check against current master.

βœ… Resolved β€” Prompt-mode path echoes no longer become vision markers

The rework moves the decision to the tool-result canonicalization site, while tool_name is still available. That means content_search and glob_search path echoes never become [IMAGE:...] markers before they are carried either as native tool messages or prompt-mode user [Tool results] messages. The new prompt-mode regression covers the carrier that the role-only approach missed.

βœ… Resolved β€” Latest generated images still route to vision

Restoring multimodal::count_image_markers(history) preserves the latest-tool-result contract, and the default-allow wrapper still canonicalizes image_gen, file_download, and future non-listing tools. The new image_gen regression covers the route that the earlier role-only counter would have dropped.

🟒 What looks good β€” Provenance is the right boundary

After canonicalization, a search-result path echo and a genuinely generated image path look the same. Making the decision before canonicalization, while the producing tool is still known, is the right boundary for this fix.

🟑 Warning β€” Refresh the branch before I clear the review

GitHub now reports this branch as CONFLICTING / DIRTY against current master, with the conflict in crates/zeroclaw-runtime/src/agent/loop_.rs. The conflict looks narrow, but it overlaps current runtime changes in the same file, so this comment does not clear my prior CHANGES_REQUESTED review yet. Please refresh the branch against current master, preserve the provenance-gated shape, and let CI rerun. Once the refreshed head is clean, I can convert this to approval quickly if the semantics stay the same.

@Nillth

Nillth commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

Refreshed the branch against current master (merge commit 827ee11). The conflicts were confined to the test modules β€” both sides had added tests at the same locations in agent/history.rs and agent/loop_.rs:

  • history.rs: kept this PR's two provenance tests alongside master's new canonicalize_tool_result_media_markers_dedups_path_already_in_marker test.
  • loop_.rs: union of the test-module imports; kept this PR's vision-routing provenance tests alongside master's new maybe_inject_channel_delivery_defaults tests.

No production-code conflicts. The provenance-gated shape is unchanged: canonicalize_tool_result_media_markers_for still makes the decision at the tool-result canonicalization site while tool_name is in hand, and it composes cleanly with master's new in-marker dedup logic in the base canonicalizer.

Local validation on the merged head: cargo fmt --all -- --check clean, cargo clippy -p zeroclaw-runtime --all-targets -- -D warnings clean, cargo test -p zeroclaw-runtime --lib β†’ 2045 passed, 0 failed.

@Nillth

Nillth commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

@Audacity88 β€” the refreshed head (827ee11) is clean: all checks including the CI Required Gate are green, and the branch is conflict-free against current master. Ready for your re-review.

@Audacity88 Audacity88 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context checked: current head 827ee1109, live PR state and checks, my earlier review, the author's follow-up comments, the current diff, the run_tool_call_loop changes, and the related public source outside the diff in Agent::turn, Agent::turn_streamed, ToolDispatcher, ACP, gateway WebSocket, and RPC turn call sites. The author reported clean local fmt, clippy, and zeroclaw-runtime --lib test validation after the refresh; I cross-checked the PR head's green CI but did not rerun local cargo validation in this pass.

βœ… Resolved β€” The earlier run_tool_call_loop blockers are fixed

For run_tool_call_loop, the new wrapper is at the right boundary: before a tool result is turned into a marker, while the producing tool name is still known. That fixes the prompt-mode [Tool results] carrier case for search path echoes, and the default-allow behavior still lets genuine image-producing tools such as image_gen trigger vision routing.

🟒 What looks good β€” Clean refresh and validation signal

The branch refresh looks clean from the public PR state: CI is green on the refreshed head, including the required gate, and the author reports matching focused local validation. The provenance-gated shape also survived the refresh in the run_tool_call_loop path.

πŸ”΄ Blocking β€” The direct Agent::turn dispatcher path still canonicalizes path listings

The new canonicalize_tool_result_media_markers_for(tool_name, output) wrapper is only used in run_tool_call_loop. The direct Agent path still stores tool results through ToolDispatcher::format_results, and both XmlToolDispatcher and NativeToolDispatcher still call the provenance-blind canonicalize_tool_result_media_markers on every result before the next model call.

That path is reachable from public runtime entry points: ACP calls Agent::turn_streamed, and both gateway WebSocket and the RPC turn path call Agent::turn_streamed_with_steering_state. Those methods format tool results, push them into history, then on the next iteration call to_provider_messages and prepare_provider_messages. This is not the exact same counter site as run_tool_call_loop; the remaining failure is that a content_search or glob_search result that merely lists /path/to/image.png can still be rewritten to [IMAGE:/path/to/image.png], then parsed and loaded as image input for the next provider request in those streamed Agent sessions.

Please carry the same tool-name-aware gating into the dispatcher/Agent result-formatting path, or move tool-result canonicalization behind one shared provenance-aware helper used by both loop implementations. Please also add a regression for NativeToolDispatcher/XmlToolDispatcher with content_search or glob_search output containing a real local image path, so this path cannot load listed images as image input again.

@Nillth Nillth marked this pull request as draft June 11, 2026 14:46
@singlerider singlerider modified the milestones: v0.8.0, v0.8.1 Jun 11, 2026

@singlerider singlerider left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at head 17a3aa9 against current master. The provenance-gating is correctly placed in run_tool_call_loop, but I verified @Audacity88's finding: two other call sites still run the blind canonicalizer, so the vision-routing hole stays open on the streamed Agent paths. Agreeing with his changes-requested.

🟒 The gating is right where it is applied

The new canonicalize_tool_result_media_markers_for(tool_name, output) wrapper sits at the right boundary in run_tool_call_loop β€” before a tool result becomes a marker, while the producing tool name is known β€” so a content_search/glob_search result that merely lists /path/to/image.png is no longer rewritten into a vision marker there, while genuine image producers like image_gen still trigger routing. That is the correct shape.

πŸ”΄ Blocking β€” confirmed: two other call sites still call the provenance-blind canonicalizer

The gated wrapper is only wired into the loop path. The blind canonicalize_tool_result_media_markers is still invoked at:

  • crates/zeroclaw-runtime/src/agent/dispatcher.rs:132 (the dispatcher result path), and
  • crates/zeroclaw-runtime/src/agent/turn/results_collect.rs:115 (the Agent::turn results collection).

The Agent::turn/turn_streamed path is reachable from public runtime entry points β€” ACP (turn_streamed), gateway WebSocket and RPC (turn_streamed_with_steering_state). On those sessions, a content_search/glob_search output containing a local image path is still rewritten to `` and then parsed and loaded as image input for the next provider request. So the bug this PR fixes persists on the streamed Agent paths.

Route all three sites through one shared provenance-aware helper (so the loop and the dispatcher/turn paths can't diverge), and add regressions for NativeToolDispatcher/XmlToolDispatcher (and the results_collect path) with content_search/glob_search output containing a real local image path, asserting the listed image is not promoted to image input.

🟑 Rebase

The branch is CONFLICTING. It needs a refresh against current master regardless, and doing so is a good moment to land the single-shared-helper structure so the gating can't drift between the two loop implementations again.

Carry the gating into the dispatcher/turn path (ideally via one shared helper) plus the regressions, and this closes the hole completely. Leaving @Audacity88's changes-requested in force.

Nillth added 2 commits June 20, 2026 09:59
Addresses the changes-requested on zeroclaw-labs#7345. Replaces the role-only
`count_user_image_markers` approach (which missed prompt-mode `[Tool results]`
carriers and dropped routing for genuinely generated latest images) with a
provenance-gated canonicalization.

Filesystem search/listing tools (content_search, glob_search) print local
image-file paths. `canonicalize_tool_result_media_markers` wrapped those as
`[IMAGE:…]`, and the agent loop counts the current iteration's tool-result
markers (`multimodal::count_image_markers`) when deciding whether to switch
to a vision provider β€” so a path echo falsely triggered vision routing and a
provider-capability error on a text-only provider.

`canonicalize_tool_result_media_markers_for` is now a no-op for path-listing
tools, so their paths never become routable markers, while
image-producing/fetching tools (image_gen, file_download, …) keep
canonicalization. The existing `count_image_markers` routing contract is
restored, so a genuinely generated latest image still routes, and the
prompt-mode carrier is handled because gating happens at the per-tool
canonicalize site (before results are merged).

Regression tests: search-tool path echoes (native + prompt-mode) do not
trigger routing; an image_gen latest result still does; user-attached
images unaffected.
…loop result path

The provenance-aware canonicalization helper
(`canonicalize_tool_result_media_markers_for`) was previously applied only in
the old monolithic `run_tool_call_loop`. After the turn-engine refactor (zeroclaw-labs#7969)
the loop's canonicalization moved to `turn::results_collect`, and the
`ToolDispatcher::format_results` path (XmlToolDispatcher / NativeToolDispatcher)
still used the provenance-blind `canonicalize_tool_result_media_markers`.

Route every tool-name-aware canonicalization site through the single shared
helper so a search/listing tool (content_search, glob_search) that merely lists
a local image path is never rewritten into a routable `[IMAGE:…]` marker β€” which
falsely triggers vision routing and a provider-capability error on a text-only
provider β€” while a genuine image-producing tool (image_gen) is still
canonicalized.

- turn::results_collect::collect_tool_results: canonicalize via `_for(tool_name)`
- XmlToolDispatcher::format_results: canonicalize via `_for(result.name)`
- NativeToolDispatcher::format_results: canonicalize via `_for(result.name)`

Adds dispatcher regression tests asserting content_search/glob_search image
paths are not promoted in either dispatcher, and that image_gen output still is.

See PR zeroclaw-labs#7345.
@Nillth Nillth force-pushed the fix/provider_capability_error branch from 17a3aa9 to 2f2b49f Compare June 20, 2026 04:39
@Nillth

Nillth commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

Good catch - you're right that gating only run_tool_call_loop left the dispatcher result-formatting path provenance-blind. Fixed it by routing every tool-name-aware canonicalization site through one shared helper.

Shared helper: agent::history::canonicalize_tool_result_media_markers_for(tool_name, output) - same denylist (content_search, glob_search) wrapper as before; default-allow for everything else (so image_gen/file_download still canonicalize).

Call sites now all gated on the producing tool name:

ToolExecutionResult already carries name, so no signature threading was needed for format_results.

Worth noting after the turn-engine unification: Agent::turn / Agent::turn_streamed / turn_streamed_with_steering_state (the ACP and gateway WebSocket + RPC entry points) all delegate to run_tool_call_loop, so the canonicalization they actually hit is results_collect - that's the primary site closing the reported leak. format_results has no production caller today (trait API exercised in tests), but I gated it anyway so the dispatcher API can't silently reintroduce the gap.

Regression tests (agent::dispatcher::tests):

  • xml_format_results_does_not_promote_search_tool_image_paths
  • native_format_results_does_not_promote_search_tool_image_paths - a content_search/glob_search result listing a real local *.png keeps the literal path and is not rewritten to [IMAGE:...], in both dispatchers.
  • format_results_still_promotes_image_producing_tool_paths - image_gen output is canonicalized into a marker in both dispatchers (default-allow preserved).

Existing vision-routing/multimodal tests still green, incl. native_dispatcher_converts_tool_results_to_tool_messages (read-side promotion of genuine image paths preserved) and the run_tool_call_loop_* vision tests.

Rebased onto current master. Validation (cargo +1.93.0): clippy -p zeroclaw-runtime --all-targets clean; fmt --all --check clean; test -p zeroclaw-runtime 2307 passed (the lone red is the pre-existing cron::store::remove_job_emits_structured_cron_delete_event parallel log-capture flake - passes in isolation, unrelated to this change). CI is green.

@Nillth Nillth marked this pull request as ready for review June 20, 2026 05:24
@Audacity88 Audacity88 added this to the v0.8.3 milestone Jun 20, 2026

@Audacity88 Audacity88 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed current head 2f2b49f4ffa776aff0d86f95b303ff771df71bfe against the live PR state, the prior Audacity88/singlerider feedback, the author's latest update, current checks, and the updated dispatcher/history/turn diff. I did not run local Cargo validation.

βœ… Resolved β€” The active tool-result collection path now keeps provenance

The current diff fixes the call sites that still have the producing tool name in hand: turn::results_collect::collect_tool_results, XmlToolDispatcher::format_results, and NativeToolDispatcher::format_results now all route through canonicalize_tool_result_media_markers_for. That resolves the previous blind format_results blocker for the direct dispatcher API and preserves the intended default-allow behavior for real image-producing tools.

πŸ”΄ Blocking β€” Native history serialization can still re-promote search paths

One provenance-blind serializer remains in the native history path. Agent::turn and the streamed turn path first convert self.history with self.tool_dispatcher.to_provider_messages(&self.history) and then hand that provider-visible transcript to run_tool_call_loop. For NativeToolDispatcher, serializing a stored ConversationMessage::ToolResults still calls canonicalize_tool_result_media_markers(&result.content) on each result. At that point the tool name has already been dropped, so a content_search/glob_search result that was correctly stored as found: /tmp/hit.png can still become found: [IMAGE:/tmp/hit.png] before the loop sees the transcript.

This is separate from whether format_results has a production caller today: the unsafe behavior is now in the native ConversationMessage::ToolResults β†’ provider-message conversion, where no provenance is available. The new native dispatcher test misses that because it stops at format_results output rather than round-tripping a native tool result through to_provider_messages. Please either keep tool provenance through the native tool-result history shape or avoid blind re-canonicalization once the output has already been formatted with the provenance-aware helper, and add a regression that fails on:

  1. NativeToolDispatcher::format_results for content_search/glob_search returning ConversationMessage::ToolResults with a real local PNG path.
  2. Passing that message through NativeToolDispatcher::to_provider_messages.
  3. Asserting the provider-visible tool content still contains the literal path and not [IMAGE:...].

The XML blind serializer is less directly exposed because XmlToolDispatcher::format_results stores formatted results as a chat/user message, not ConversationMessage::ToolResults, so I would not treat XML as the blocker here. The native history serializer is the remaining unsafe path.

Nillth added 2 commits June 23, 2026 12:18
…ide vision gating

The native history serializer (NativeToolDispatcher::to_provider_messages,
reached by Agent::turn before the loop runs) re-canonicalized stored
ConversationMessage::ToolResults provenance-blind. The producing tool name was
already dropped by then, so a content_search/glob_search result stored as
'found: /hit.png' was re-promoted to '[IMAGE:/hit.png]', falsely triggering
vision routing and a provider-capability error on a text-only provider. The
write-side gate did not cover this read path.

Carry the producing tool name through the persisted shape:

- Add tool_name (serde(default)) to ToolResultMessage in zeroclaw-api; old
  serialized sessions still deserialize and default to empty.
- NativeToolDispatcher::format_results records the producing tool name.
- Both to_provider_messages read paths (native and XML) now go through the
  provenance-aware canonicalize_tool_result_media_markers_for, so a stored
  search/listing path stays literal while genuine image-producing tools still
  canonicalize.
- ACP session resume populates tool_name from the row it already stores.
- Every other ToolResultMessage constructor sets tool_name empty, falling
  through to the blind canonicalizer exactly as before, preserving the zeroclaw-labs#6183
  read-side promotion of genuine image paths with no recorded provenance.

Regression tests round-trip format_results -> to_provider_messages: a search
path survives as a literal, an image_gen path still promotes, and an
empty-provenance result still promotes (the zeroclaw-labs#6183 contract).

@singlerider singlerider left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed head 2f2b49f4 against master, @Audacity88's latest changes-requested, my prior comment, and the dispatcher/turn/history diff. Read the source at head; did not run local Cargo.

🟒 RESOLVED: my prior two-call-site blocker is closed

The blind canonicalizer is now gone from both sites I flagged. turn::results_collect (results_collect.rs:122) and the dispatcher result path (dispatcher.rs:139, :223) all route through canonicalize_tool_result_media_markers_for(tool_name, ...), so a content_search/glob_search result that merely lists /path/to/image.png is no longer promoted to a vision marker on the loop or turn-collection paths, while genuine producers like image_gen still route. That is the shape I asked for.

πŸ”΄ Blocking: concur with @Audacity88 on the native history serializer

I verified the remaining hole independently and it is real. NativeToolDispatcher::format_results (dispatcher.rs:223) correctly stores provenance-aware output, but the stored shape is ToolResultMessage { tool_call_id, content } (crates/zeroclaw-api/src/model_provider.rs:166), which carries no tool name. NativeToolDispatcher::to_provider_messages then blind-canonicalizes that stored content at dispatcher.rs:258:

"content": canonicalize_tool_result_media_markers(&result.content),

So found: /tmp/hit.png (correctly left alone at storage time) gets re-promoted to [IMAGE:...] on serialization. This is reachable in production: both agent.rs:1915 (turn) and agent.rs:2258 (streamed turn) call to_provider_messages(&self.history) before handing the transcript to the loop. The new native dispatcher test stops at format_results output and never round-trips through to_provider_messages, so it misses this.

Fix is as @Audacity88 described: either keep provenance through the native ConversationMessage::ToolResults shape, or stop blind re-canonicalizing content that was already produced by the provenance-aware helper. Add the regression that asserts a content_search/glob_search PNG-path result survives format_results -> to_provider_messages with the literal path intact and no [IMAGE:...].

The XML serializer at dispatcher.rs:176 is the same blind call but stores as a user chat message, not ConversationMessage::ToolResults, so I agree it is not the blocker here.

Branch is also CONFLICTING and needs a merge from master.

Leaving @Audacity88's changes-requested in force; not posting a competing verdict. Close the native-serializer path plus the round-trip regression and this is done.

@github-actions github-actions Bot added the provider Auto scope: src/providers/** changed. label Jun 23, 2026
@Nillth

Nillth commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

You're right, and thanks for pinning it down precisely. The remaining unsafe path
was the native history serializer: Agent::turn / the streamed turn hand
self.tool_dispatcher.to_provider_messages(&self.history) to the loop, and for
NativeToolDispatcher that read-side conversion ran the provenance-blind
canonicalize_tool_result_media_markers on each stored ToolResultMessage. The
tool name was already gone by then, so a content_search/glob_search result
stored correctly as found: /hit.png could be re-promoted to
found: [IMAGE:/hit.png] before the loop ever saw the transcript. The write-side
gate didn't cover that.

I took the first option you offered β€” keep tool provenance through the native
tool-result history shape
:

  • Added tool_name: String to ToolResultMessage in zeroclaw-api
    (#[serde(default)], so existing persisted sessions still deserialize and the
    field defaults to empty).
  • NativeToolDispatcher::format_results now records tool_name on the stored
    result.
  • Both to_provider_messages read paths (native and XML) now go through the
    provenance-aware canonicalize_tool_result_media_markers_for(&result.tool_name, …)
    instead of the blind helper. A stored search/listing path stays literal; a
    genuine image_gen/file_download path still canonicalizes.
  • The ACP resume path (acp_session_store) already has the producing tool name
    in scope (the out row carries it, looked up from the matching in row on
    write), so resumed sessions are provenance-aware too.

Backstop for the #6183 contract: every other ToolResultMessage constructor
(provider-wire reconstruction, synthetic/error results) sets tool_name empty,
and _for("", …) falls straight through to the blind canonicalizer β€” so a
genuine image path with no recorded provenance is still promoted on read, exactly
as before. The compiler enforces that every literal is updated, so nothing slips
through silently.

Regression test β€” your exact three steps (agent::dispatcher::tests):
native_search_path_survives_format_then_to_provider_messages runs a
content_search/glob_search result with a real on-disk PNG through
format_results and then to_provider_messages, and asserts the
provider-visible tool content keeps the literal path and contains no [IMAGE:.
Two companions guard the other directions:
native_image_gen_path_still_promotes_through_to_provider_messages (real image
still routes across the round trip) and native_unknown_provenance_still_promotes_on_read
(the #6183 blind-canon path for empty provenance). The existing
native_dispatcher_converts_tool_results_to_tool_messages stays green.

Validation (cargo +1.93.0): clippy --all-targets -- -D warnings clean,
fmt --all --check clean, check --workspace --all-targets clean; the
agent:: module is 577/0 single-threaded (the lone red under full parallelism is
the pre-existing shared-response-cache contention flake, text-only and unrelated).

Honest residual: the flat provider-wire replay seed (Agent::turn's
msg.role == "tool" reconstruction) rebuilds results from a {tool_call_id, content} payload that never carried the name, so those land with empty
provenance. It's a narrow pre-existing seam (not the per-turn loop), and closing
it would mean putting tool_name into the JSON we actually send the model β€” I'd
rather not change provider-facing content for it. Happy to track it separately if
you'd prefer.

(Adjacent, for context: #8180 closes the complementary user-side case β€” a stale
[IMAGE:] from a user message poisoning the no-vision capability check. Disjoint
files, no overlap with this one.)

@singlerider singlerider left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed at head 4b28224e0 (a clean merge from master, so the CONFLICTING state @Audacity88 and I both flagged is gone). My prior verdict was a COMMENT at 2f2b49f4 concurring with @Audacity88's changes-requested on the native history serializer. Ran the battery on this head (checked out in place, restored after): cargo test -p zeroclaw-api 58 passed, cargo test -p zeroclaw-infra acp 22 passed, cargo test -p zeroclaw-runtime --lib agent::dispatcher 15 passed including all five new gating/round-trip tests, cargo clippy -p zeroclaw-runtime -p zeroclaw-api -p zeroclaw-infra --all-targets -- -D warnings clean.

🟒 RESOLVED: the native history serializer no longer re-promotes search paths

The exact hole @Audacity88 and I held the block on is closed. ToolResultMessage now carries a #[serde(default)] tool_name (model_provider.rs:166), populated in NativeToolDispatcher::format_results (tool_name: result.name.clone()), and the previously-blind to_provider_messages serialization now routes through canonicalize_tool_result_media_markers_for(&result.tool_name, ...) instead of the blind canonicalizer. So a content_search/glob_search result stored as found: /tmp/hit.png keeps its literal path when the stored history is serialized for the next provider call, rather than getting re-promoted to [IMAGE:...]. ACP resume also carries the producing tool name across a session reload by looking it up from the matching in row (acp_session_store.rs), so provenance survives restore.

I verified the regression closes the precise gap the prior native test missed: native_search_path_survives_format_then_to_provider_messages round-trips format_results then to_provider_messages with a real on-disk PNG for both search tools and asserts no [IMAGE: and literal-path survival. native_image_gen_path_still_promotes_through_to_provider_messages proves genuine producers still route, and native_unknown_provenance_still_promotes_on_read preserves the #6183 contract for empty-provenance image paths. All three pass.

🟒 The disclosed flat-replay residual is acceptable

The provider-wire replay seed in Agent::turn (agent.rs:1776/:1800) reconstructs results from {tool_call_id, content} with no tool name, so those land with empty tool_name and fall back to blind canonicalization. That is disclosed in the PR body, is genuinely narrow (off the per-turn loop, pre-existing), and closing it would mean adding tool_name to the JSON sent to the model. Declining that here is the right call; it is out of scope for this fix and the empty-provenance fallthrough is the documented #6183 behavior, not a new gap.

Verdict

On the merits this is done: every blocker @Audacity88 and I raised across the thread is resolved on this head, the read-side gate is the right shape, and the round-trip regression locks the invariant so the two serializer paths cannot drift again. I am posting this as a COMMENT, not an approval, because @Audacity88 holds an active CHANGES_REQUESTED at the earlier head 2f2b49f4 and I will not override another reviewer's block. @Audacity88, the native-serializer blocker you described is closed here with the round-trip regression you asked for; please convert or dismiss when you have a moment and this is clear to land.

@Audacity88 Audacity88 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed current head 4b28224 against my earlier changes-requested review at 2f2b49f, singlerider's latest approval, the author's latest update, the live PR state and checks, and the current dispatcher/history/API/session-store diff. I did not run local Cargo in this pass; the current CI Required Gate and quality checks are green, and the thread has focused validation from both the author and singlerider on this head.

βœ… Resolved β€” Native history serialization now keeps tool provenance

The remaining blocker from my prior review is closed. ToolResultMessage now carries a serde-defaulted tool_name, NativeToolDispatcher::format_results records the producing tool name when it still has that provenance, and NativeToolDispatcher::to_provider_messages now serializes stored tool results through canonicalize_tool_result_media_markers_for(&result.tool_name, ...) instead of the provenance-blind canonicalizer. That means a stored content_search or glob_search result such as found: /tmp/hit.png stays a literal path when native history is replayed to the provider, instead of being re-promoted to [IMAGE:...] before the next loop iteration.

The ACP resume path also carries the recovered tool name into the stored ToolResultMessage, so restored native tool results keep the same provenance-aware behavior. The earlier write-side fixes in turn::results_collect, XmlToolDispatcher::format_results, and NativeToolDispatcher::format_results are still in place, so the write and read paths now agree on the same helper.

🟒 What looks good β€” The empty-provenance fallback preserves the existing image contract

The provider-wire replay path still has no producing tool name, so the new constructors deliberately set tool_name: String::new() there and let empty provenance fall back to the blind canonicalizer. That preserves the #6183 behavior for genuine image paths that arrive without recorded provenance, and it avoids changing the JSON content sent to providers just to close this PR's narrower history-serialization hole. The residual is clearly disclosed in the PR body and is the right scope boundary for this fix.

🟒 What looks good β€” The regression covers the exact missed round trip

The new native_search_path_survives_format_then_to_provider_messages test exercises the precise sequence my prior review asked for: native format_results with a real on-disk PNG path from content_search or glob_search, followed by to_provider_messages, then an assertion that provider-visible tool content still contains the literal path and no [IMAGE: marker. The companion tests keep the other two sides honest: image_gen still promotes through the same round trip, and empty-provenance history still promotes a real image path. That is the right coverage for the bug class.

This clears my prior blocker. Approved.

@Nillth Nillth merged commit 381a3e5 into zeroclaw-labs:master Jun 23, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Auto scope: src/agent/** changed. bug Something isn't working provider Auto scope: src/providers/** changed. risk: high Auto risk: security/runtime/gateway/tools/workflows. runtime Auto scope: src/runtime/** changed. size: S Auto size: 81-250 non-doc changed lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants