fix(streaming): cancel scheduler when a loop guard fires, not just suppress output#89
Open
tbraun96 wants to merge 2 commits into
Open
fix(streaming): cancel scheduler when a loop guard fires, not just suppress output#89tbraun96 wants to merge 2 commits into
tbraun96 wants to merge 2 commits into
Conversation
…ps the response When the Bug-2 name-run cap (or F11 within-dedup / F5 cross-flush dedup / F44 perm-fail circuit-breaker) forcibly ends a streaming response, `finish_reason` was previously `"tool_calls"` — because tool calls *were* emitted, just truncated mid-loop. Agent clients (opencode and friends) see a normal-looking tool-call completion, dutifully run the tools, send the next request, and the model loops again — Atlas was breaking the loop one round at a time without ever telling the client. Add a `tool_loop_capped: bool` on `StreamState`, flipped true alongside `stop_string_triggered` at every tool-call loop guard (4 sites in `tool_handlers.rs`). `handle_done` reads it and overrides `fr` to `"length"` — OpenAI's spec slot for "response was forcibly truncated" — ahead of the existing `"tool_calls"` / `finish_reason` fall-throughs. This gives every agent client a clean, spec-compliant hook to break its outer retry loop without needing Atlas-specific headers. Also dumped to the `--dump` synthesized-response body for observability. Verified: `cargo check`, `cargo clippy --tests`, `cargo fmt --check` all clean. Live repro will follow once the image is rebuilt. Co-Authored-By: Azeez Ishaqui <debaterishaqui@gmail.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ppress output The PR #87 fix changed finish_reason to "length" when a tool-loop guard trips, so agent clients can break their outer retry loop — but only when the scheduler actually finalises and emits Done. Live repro on opencode revealed the deeper bug: setting `stop_string_triggered = true` in chat_stream only suppresses *output*; the scheduler keeps generating tokens until natural EOS or `max_tokens`. On a degenerate-loop response (model not EOS-ing), this manifests as a hang — the stream silently consumes tokens, the channel can fill, the scheduler can block on `blocking_send`, GPU goes 0%, no Done event ever fires, opencode sits forever waiting on the SSE stream. Add a cooperative cancellation flag plumbed from chat_stream into the scheduler: Arc<AtomicBool> cancel_flag │ ├── created in chat_stream/mod.rs ├── passed into InferenceRequest::Streaming { cancel_flag, .. } ├── stashed on StreamState (cancel_flag) — chat_stream flips true on: │ • Bug-2 name-run cap trip (handle_complete_tool_call, │ handle_tool_call_end) │ • F11 within-response dedup │ • F44 perm-fail circuit-breaker │ • cross-flush tool_arg_dedup trip │ • loop-watchdog fire (SimHash + substring repeat) └── carried through PrefillInProgress → ActiveSeq on the scheduler side; `emit_step::emit_token` reads it at the top of every token-emit and sets `a.finished = true` if flipped — equivalent to an EOS, so the existing finalize path runs and `handle_done` emits the proper `tool_loop_capped` / `finish_reason="length"` chunks + `[DONE]`. Spill-restored ActiveSeq carries `cancel_flag: None` — the original streaming connection is long gone by the time a swapped-out seq resumes from disk. /v1/completions also passes a fresh never-flipped flag so the scheduler's type-check is satisfied; the guard pipeline doesn't run on that legacy path yet. Verified: `cargo check`, `cargo clippy --tests`, `cargo fmt --check`, `cargo test -p spark-server` (484 passed), `cargo build --release` all clean. Co-Authored-By: Azeez Ishaqui <debaterishaqui@gmail.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background. PR #87 made
finish_reason="length"flow on tool-loop cap trips so agent clients can break their outer retry loop — but only if the scheduler actually finalises. Live opencode session revealed the deeper bug:stop_string_triggered = trueonly suppresses output; the scheduler keeps generating tokens until natural EOS ormax_tokens. On a degenerate-loop response where the model doesn't EOS, this manifests as a hang — channel can fill,blocking_sendblocks, GPU goes 0%, noDoneever fires, the client sits forever on the SSE stream.Fix. A cooperative
Arc<AtomicBool> cancel_flagplumbed fromchat_streamthroughInferenceRequest::Streaming→PrefillInProgress→ActiveSeq.emit_step::emit_tokenchecks it at the top and setsa.finished = truewhen flipped — equivalent to EOS, so the existing finalize path runs andhandle_doneemits the propertool_loop_capped/finish_reason="length"chunks +[DONE].Stream-side, the flag is flipped on every forced-stop condition:
handle_complete_tool_callandhandle_tool_call_end)tool_arg_deduptripEdge cases. Spill-restored
ActiveSeqcarriescancel_flag: None(the original stream is long gone by the time a swapped seq resumes)./v1/completionspasses a fresh never-flipped flag so the scheduler type-checks cleanly; the guard pipeline isn't wired to that legacy path yet.Verification. Local:
cargo check,cargo clippy --tests,cargo fmt --check,cargo test -p spark-server(484 passed),cargo build --releaseall clean.Held off on Docker Hub push until local validation.