fix(streaming): cancel scheduler when a loop guard fires, not just suppress output by tbraun96 · Pull Request #89 · Avarok-Cybersecurity/atlas

tbraun96 · 2026-05-23T11:39:15Z

Background. PR #87 made finish_reason="length" flow on tool-loop cap trips so agent clients can break their outer retry loop — but only if the scheduler actually finalises. Live opencode session revealed the deeper bug: stop_string_triggered = true only suppresses output; the scheduler keeps generating tokens until natural EOS or max_tokens. On a degenerate-loop response where the model doesn't EOS, this manifests as a hang — channel can fill, blocking_send blocks, GPU goes 0%, no Done ever fires, the client sits forever on the SSE stream.

Fix. A cooperative Arc<AtomicBool> cancel_flag plumbed from chat_stream through InferenceRequest::Streaming → PrefillInProgress → ActiveSeq. emit_step::emit_token checks it at the top and sets a.finished = true when flipped — equivalent to EOS, so the existing finalize path runs and handle_done emits the proper tool_loop_capped / finish_reason="length" chunks + [DONE].

Stream-side, the flag is flipped on every forced-stop condition:

Bug-2 name-run cap trip (both handle_complete_tool_call and handle_tool_call_end)
F11 within-response dedup
F44 permanent-failure circuit-breaker
cross-flush tool_arg_dedup trip
loop-watchdog fire (SimHash + substring repeat)

Edge cases. Spill-restored ActiveSeq carries cancel_flag: None (the original stream is long gone by the time a swapped seq resumes). /v1/completions passes a fresh never-flipped flag so the scheduler type-checks cleanly; the guard pipeline isn't wired to that legacy path yet.

Verification. Local: cargo check, cargo clippy --tests, cargo fmt --check, cargo test -p spark-server (484 passed), cargo build --release all clean.

Held off on Docker Hub push until local validation.

…ps the response When the Bug-2 name-run cap (or F11 within-dedup / F5 cross-flush dedup / F44 perm-fail circuit-breaker) forcibly ends a streaming response, `finish_reason` was previously `"tool_calls"` — because tool calls *were* emitted, just truncated mid-loop. Agent clients (opencode and friends) see a normal-looking tool-call completion, dutifully run the tools, send the next request, and the model loops again — Atlas was breaking the loop one round at a time without ever telling the client. Add a `tool_loop_capped: bool` on `StreamState`, flipped true alongside `stop_string_triggered` at every tool-call loop guard (4 sites in `tool_handlers.rs`). `handle_done` reads it and overrides `fr` to `"length"` — OpenAI's spec slot for "response was forcibly truncated" — ahead of the existing `"tool_calls"` / `finish_reason` fall-throughs. This gives every agent client a clean, spec-compliant hook to break its outer retry loop without needing Atlas-specific headers. Also dumped to the `--dump` synthesized-response body for observability. Verified: `cargo check`, `cargo clippy --tests`, `cargo fmt --check` all clean. Live repro will follow once the image is rebuilt. Co-Authored-By: Azeez Ishaqui <debaterishaqui@gmail.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ppress output The PR #87 fix changed finish_reason to "length" when a tool-loop guard trips, so agent clients can break their outer retry loop — but only when the scheduler actually finalises and emits Done. Live repro on opencode revealed the deeper bug: setting `stop_string_triggered = true` in chat_stream only suppresses *output*; the scheduler keeps generating tokens until natural EOS or `max_tokens`. On a degenerate-loop response (model not EOS-ing), this manifests as a hang — the stream silently consumes tokens, the channel can fill, the scheduler can block on `blocking_send`, GPU goes 0%, no Done event ever fires, opencode sits forever waiting on the SSE stream. Add a cooperative cancellation flag plumbed from chat_stream into the scheduler: Arc<AtomicBool> cancel_flag │ ├── created in chat_stream/mod.rs ├── passed into InferenceRequest::Streaming { cancel_flag, .. } ├── stashed on StreamState (cancel_flag) — chat_stream flips true on: │ • Bug-2 name-run cap trip (handle_complete_tool_call, │ handle_tool_call_end) │ • F11 within-response dedup │ • F44 perm-fail circuit-breaker │ • cross-flush tool_arg_dedup trip │ • loop-watchdog fire (SimHash + substring repeat) └── carried through PrefillInProgress → ActiveSeq on the scheduler side; `emit_step::emit_token` reads it at the top of every token-emit and sets `a.finished = true` if flipped — equivalent to an EOS, so the existing finalize path runs and `handle_done` emits the proper `tool_loop_capped` / `finish_reason="length"` chunks + `[DONE]`. Spill-restored ActiveSeq carries `cancel_flag: None` — the original streaming connection is long gone by the time a swapped-out seq resumes from disk. /v1/completions also passes a fresh never-flipped flag so the scheduler's type-check is satisfied; the guard pipeline doesn't run on that legacy path yet. Verified: `cargo check`, `cargo clippy --tests`, `cargo fmt --check`, `cargo test -p spark-server` (484 passed), `cargo build --release` all clean. Co-Authored-By: Azeez Ishaqui <debaterishaqui@gmail.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

tbraun96 and others added 2 commits May 22, 2026 21:12

tbraun96 requested a review from AzeezIsh as a code owner May 23, 2026 11:39

tbraun96 mentioned this pull request May 23, 2026

fix(streaming): detect & cancel in-think <tool_call> leak (Qwen3.6 + opencode) #90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(streaming): cancel scheduler when a loop guard fires, not just suppress output#89

fix(streaming): cancel scheduler when a loop guard fires, not just suppress output#89
tbraun96 wants to merge 2 commits into
mainfrom
fix/scheduler-cancel-flag

tbraun96 commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tbraun96 commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant