fix(vision): scope the no-vision capability error to the latest user image#8180
Open
Nillth wants to merge 1 commit into
Open
fix(vision): scope the no-vision capability error to the latest user image#8180Nillth wants to merge 1 commit into
Nillth wants to merge 1 commit into
Conversation
…image Sending an image to a model_provider without vision support (and with no vision_model_provider configured) raised a provider_capability_error AND left the [IMAGE:] marker in the long-lived session history. The capability error was triggered by a history-wide marker count, so every later turn, even plain text, re-counted the stale marker and re-failed forever. The RPC/streaming path makes this permanent: it persists the user message into the session history before the loop runs, so a failed image turn leaves its marker behind. A single image to a non-vision provider made the session unusable until restart. Scope the capability error to the most recent genuine user message: - providers/multimodal: add count_latest_user_image_markers(), the turn-scoped counterpart to count_user_image_markers (skips tool-result carriers and older user messages). - runtime/turn/vision_route: error only when the latest user message carries an image (the user just sent something we cannot see); a carried-over marker, from an earlier failed turn or a vision to non-vision model switch, degrades to text-only (markers stripped) so the turn continues. This also covers the model-switch case. The user is still told once, on the turn they send the image; subsequent text turns recover instead of re-failing. Tests: - providers: count_latest_user_image_markers_scopes_to_newest_user_message - runtime: run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider (end-to-end through the shared engine; asserts the carried-over marker is stripped and the plain-text turn succeeds)
53 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
mastervision_model_providerconfigured) raisedprovider_capability_error capability=visionAND left the
[IMAGE:]marker in the long-lived session history. Because the error waskeyed off a history-wide marker count, every later turn (even plain text) re-counted the
stale marker and re-failed forever. The RPC/streaming path makes it permanent: it persists
the inbound user message into the session history before the turn loop runs, so a failed
image turn leaves its marker behind. A single image to a non-vision provider made the
session unusable until restart.
resolve_vision_providernow errors only when the user just sent an image we cannot see; a carried-over marker
(a prior failed image turn, or a vision -> non-vision model switch mid-session) degrades to
text-only (markers stripped, surrounding text preserved) so the conversation continues.
count_latest_user_image_markers()inzeroclaw-providers, the turn-scoped counterpartto
count_user_image_markers(), reusing the same genuine-user-message predicate.vision_model_provideris configured, no change totool-result image degradation (already degraded), no change to the channel orchestrator path
(which never persisted failed-turn images). No config / CLI / API / env surface change.
resolve_vision_provideris the single shared chokepoint insiderun_tool_call_loop(one call site,turn/mod.rs), so the behaviour change reaches everytransport (RPC/streaming, non-streaming, channels). The new function is purely additive.
bugapplied.risk:andsize:are auto-applied by the repo labeler(the path scope labels
agent/provider/runtimeare already on); the auto-risk ruleclassifies runtime-path changes as higher risk, so risk/size are left to the automation.
Validation Evidence (required)
Toolchain pinned to CI's
1.93.0.Tail output:
persists the user message to the long-lived session history before the loop). Confirmed the
single call site of
resolve_vision_provider. Reproduced the fix end-to-end through theshared engine (carried-over image -> stripped to
[media attachment], plain-text turnsucceeds). Confirmed the first-turn capability error is preserved.
cron::store::tests::remove_job_emits_structured_cron_delete_eventis a pre-existingtest-isolation flake (a UUID assert against a process-global log broadcast; a sibling
remove_jobcaller races its event in under in-process parallelism). The diff touches nocron code, the test passes in isolation, and CI's nextest (process-per-test) isolates it.
--features ci-allclippy combo (needsglib-2.0/libudevsystem libs unavailable on this host). The change touches none of the voice/desktopfeatures
ci-alladds; deferred to CI. A docs-coverage heuristic WARN fired on the newpub fn, but it is an internal cross-crate helper, not a user-facing surface, so no docs areneeded.
Security & Privacy Impact (required)
the image bytes never reach the model in either branch, and no trust boundary or policy check
is affected.
Compatibility (required)
now recovers on the next turn instead of failing permanently.
Rollback (required for
risk: mediumandrisk: high)git revert <sha>(single, self-contained commit; the newfunction is additive, so reverting cleanly restores the prior history-wide-count behaviour).
a
vision_model_provider, which is unaffected by this change.)provider_capability_errorwithcapability=vision, or the degrade WARNno vision route for carried-over/tool-result image marker(s); degrading to text-only. A regression would show the capability error re-firing onplain-text turns after an image, or an image the user just sent being silently dropped.