Skip to content

fix(streaming): cancel scheduler when a loop guard fires, not just suppress output#89

Open
tbraun96 wants to merge 2 commits into
mainfrom
fix/scheduler-cancel-flag
Open

fix(streaming): cancel scheduler when a loop guard fires, not just suppress output#89
tbraun96 wants to merge 2 commits into
mainfrom
fix/scheduler-cancel-flag

Conversation

@tbraun96
Copy link
Copy Markdown
Contributor

Background. PR #87 made finish_reason="length" flow on tool-loop cap trips so agent clients can break their outer retry loop — but only if the scheduler actually finalises. Live opencode session revealed the deeper bug: stop_string_triggered = true only suppresses output; the scheduler keeps generating tokens until natural EOS or max_tokens. On a degenerate-loop response where the model doesn't EOS, this manifests as a hang — channel can fill, blocking_send blocks, GPU goes 0%, no Done ever fires, the client sits forever on the SSE stream.

Fix. A cooperative Arc<AtomicBool> cancel_flag plumbed from chat_stream through InferenceRequest::StreamingPrefillInProgressActiveSeq. emit_step::emit_token checks it at the top and sets a.finished = true when flipped — equivalent to EOS, so the existing finalize path runs and handle_done emits the proper tool_loop_capped / finish_reason="length" chunks + [DONE].

Stream-side, the flag is flipped on every forced-stop condition:

  • Bug-2 name-run cap trip (both handle_complete_tool_call and handle_tool_call_end)
  • F11 within-response dedup
  • F44 permanent-failure circuit-breaker
  • cross-flush tool_arg_dedup trip
  • loop-watchdog fire (SimHash + substring repeat)

Edge cases. Spill-restored ActiveSeq carries cancel_flag: None (the original stream is long gone by the time a swapped seq resumes). /v1/completions passes a fresh never-flipped flag so the scheduler type-checks cleanly; the guard pipeline isn't wired to that legacy path yet.

Verification. Local: cargo check, cargo clippy --tests, cargo fmt --check, cargo test -p spark-server (484 passed), cargo build --release all clean.

Held off on Docker Hub push until local validation.

tbraun96 and others added 2 commits May 22, 2026 21:12
…ps the response

When the Bug-2 name-run cap (or F11 within-dedup / F5 cross-flush dedup /
F44 perm-fail circuit-breaker) forcibly ends a streaming response,
`finish_reason` was previously `"tool_calls"` — because tool calls *were*
emitted, just truncated mid-loop. Agent clients (opencode and friends)
see a normal-looking tool-call completion, dutifully run the tools, send
the next request, and the model loops again — Atlas was breaking the
loop one round at a time without ever telling the client.

Add a `tool_loop_capped: bool` on `StreamState`, flipped true alongside
`stop_string_triggered` at every tool-call loop guard (4 sites in
`tool_handlers.rs`). `handle_done` reads it and overrides `fr` to
`"length"` — OpenAI's spec slot for "response was forcibly truncated" —
ahead of the existing `"tool_calls"` / `finish_reason` fall-throughs.
This gives every agent client a clean, spec-compliant hook to break its
outer retry loop without needing Atlas-specific headers.

Also dumped to the `--dump` synthesized-response body for observability.

Verified: `cargo check`, `cargo clippy --tests`, `cargo fmt --check` all
clean. Live repro will follow once the image is rebuilt.

Co-Authored-By: Azeez Ishaqui <debaterishaqui@gmail.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ppress output

The PR #87 fix changed finish_reason to "length" when a tool-loop guard
trips, so agent clients can break their outer retry loop — but only when
the scheduler actually finalises and emits Done. Live repro on opencode
revealed the deeper bug: setting `stop_string_triggered = true` in
chat_stream only suppresses *output*; the scheduler keeps generating
tokens until natural EOS or `max_tokens`. On a degenerate-loop response
(model not EOS-ing), this manifests as a hang — the stream silently
consumes tokens, the channel can fill, the scheduler can block on
`blocking_send`, GPU goes 0%, no Done event ever fires, opencode sits
forever waiting on the SSE stream.

Add a cooperative cancellation flag plumbed from chat_stream into the
scheduler:

  Arc<AtomicBool> cancel_flag
    │
    ├── created in chat_stream/mod.rs
    ├── passed into InferenceRequest::Streaming { cancel_flag, .. }
    ├── stashed on StreamState (cancel_flag) — chat_stream flips true on:
    │     • Bug-2 name-run cap trip (handle_complete_tool_call,
    │       handle_tool_call_end)
    │     • F11 within-response dedup
    │     • F44 perm-fail circuit-breaker
    │     • cross-flush tool_arg_dedup trip
    │     • loop-watchdog fire (SimHash + substring repeat)
    └── carried through PrefillInProgress → ActiveSeq on the scheduler
        side; `emit_step::emit_token` reads it at the top of every
        token-emit and sets `a.finished = true` if flipped — equivalent
        to an EOS, so the existing finalize path runs and `handle_done`
        emits the proper `tool_loop_capped` / `finish_reason="length"`
        chunks + `[DONE]`.

Spill-restored ActiveSeq carries `cancel_flag: None` — the original
streaming connection is long gone by the time a swapped-out seq resumes
from disk. /v1/completions also passes a fresh never-flipped flag so
the scheduler's type-check is satisfied; the guard pipeline doesn't run
on that legacy path yet.

Verified: `cargo check`, `cargo clippy --tests`, `cargo fmt --check`,
`cargo test -p spark-server` (484 passed), `cargo build --release` all
clean.

Co-Authored-By: Azeez Ishaqui <debaterishaqui@gmail.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant