-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Fail request if FSM fails to advance #18780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fail request if FSM fails to advance #18780
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Ibrahim Ahmed <[email protected]>
389a97c
to
49f2024
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one tiny comment.
fwiw I think it is better to raise exception and propagate it accordingly in the engine. but that is probably for another day.
@@ -779,8 +779,15 @@ def update_from_output( | |||
# NOTE: structured_output_request | |||
# should not be None if use_structured_output, we have | |||
# check above, so safe to ignore type warning | |||
request.structured_output_request.grammar.accept_tokens( # type: ignore[union-attr] | |||
req_id, new_token_ids) | |||
if not request.structured_output_request.grammar.accept_tokens( # type: ignore[union-attr] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also add a note and create a bug for tracking this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an issue #18783
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does that look @aarnphm
Fix streaming requests hanging when structured output FSM fails to advance
Problem
When using structured outputs with the xgrammar backend, streaming requests would hang indefinitely if the FSM (Finite State Machine) failed to advance. This occurred when
accept_tokens()
returnedFalse
in the xgrammar backend, logging an error but not properly terminating the request.Diagnosis
The issue was in the scheduler's
update_from_output()
method. When processing new tokens for structured output requests, the code calledaccept_tokens()
but ignored its return value:When the xgrammar FSM encountered an invalid token sequence, it would:
"Failed to advance FSM for request %s for tokens %s. Please file an issue."
False
fromaccept_tokens()
Since the scheduler didn't check the return value, it continued processing as if nothing was wrong, causing the streaming response to hang indefinitely without sending a completion signal.
Solution
The fix checks the return value of
accept_tokens()
and properly terminates the request when it returnsFalse
:This ensures that:
FINISHED_ABORTED