Skip to content

spark-server: avoid orphan tool messages in F32#105

Open
lesserevil wants to merge 1 commit into
Avarok-Cybersecurity:mainfrom
lesserevil:fix/minimax-f32-orphan-tool
Open

spark-server: avoid orphan tool messages in F32#105
lesserevil wants to merge 1 commit into
Avarok-Cybersecurity:mainfrom
lesserevil:fix/minimax-f32-orphan-tool

Conversation

@lesserevil
Copy link
Copy Markdown

Summary

F32 no longer clones an old failed role: tool message onto the end of the conversation. It now surfaces the failed tool result in a runtime reminder, preserving OpenAI tool-message ordering so MiniMax and other vendor templates do not reject the prompt as an orphan tool result.

Test plan

  • cargo fmt --all -- --check
  • LIBRARY_PATH=/opt/vllm/nccl-blackwell/lib LD_LIBRARY_PATH=/opt/vllm/nccl-blackwell/lib ATLAS_SKIP_BUILD=1 cargo test -p spark-server f32 -- --nocapture
  • LIBRARY_PATH=/opt/vllm/nccl-blackwell/lib LD_LIBRARY_PATH=/opt/vllm/nccl-blackwell/lib ATLAS_SKIP_BUILD=1 CUDARC_CUDA_VERSION=13000 cargo clippy -p spark-server --tests -- -Dwarnings
  • ATLAS_SKIP_BUILD=1 cargo clippy --workspace --tests --all-features -- -Dwarnings still fails on this Linux host before this patch because the workspace pulls objc2, which requires an Apple target.
  • bash scripts/check-license-headers.sh could not run because scripts/check-license-headers.sh is not present in this checkout.
  • typos could not run because typos is not installed on this host.
  • Tested against a real model / hardware: deployed on godspeed + savitar with nvidia/MiniMax-M2.7-NVFP4 EP=2; a synthetic F32 regression request now returns HTTP 200 and logs F32: surfaced most-recent failed tool_result in a runtime reminder; Hermes smoke test no longer hits the MiniMax template 400.
  • Added focused unit coverage for stale failed-tool surfacing and the already-fresh no-op case.

Notes for reviewers

The previous F32 behavior attempted to make a stale failure fresh by duplicating the original tool result. That creates invalid OpenAI history whenever messages after the original assistant/tool pair are not assistant tool calls, and MiniMax's template rejects it with "Message has tool role, but there was no previous assistant message with a tool call".

This keeps the guard's intent while staying inside the template contract: the stale failure is copied into a system-reminder appended to the latest user/tool message instead of being represented as a new role: tool turn.

Benchmarks: not run; this is a prompt-history correctness fix, not a performance-oriented change.

Authorship: AI-generated by Codex under human operator direction; no human-written code sections are claimed.

CLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant