Skip to content

Conversation

tjohnson31415
Copy link
Collaborator

Some fixes from testing handling of request cancellation:

  • in V0, guard against a KeyError in _req_ids2idx
  • in v1, specialize the Scheduler's finish_requests() to handle the holdback_queue

FIX #36

@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes:

pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

else:
# this try-except is the specialization for Spyre
try:
self.holdback_queue.remove(request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah dang, this is unfortunate.

I think maybe we can fix this in a simpler way by removing self.holdback_queue as an instance attribute, and instead just make it a local variable during self.schedule(). After we schedule a new batch, we can take all the requests that we held back and put them back in self.waiting, and then we won't need to worry about breaking assumptions that the v1 scheduler has

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would let us get rid of the override on get_num_unfinished_requests as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!
I left it as an instance variable so that we don't remake the deque. It is also still used in _handle_rejects, though there probably can't be rejection during scheduling?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think that it should be safe to remove it from usage in _handle_rejects as well since this should all be synchronous. The output processing that calls _handle_rejects can't be happening concurrently with scheduling a new forward pass

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it up to you on merging now vs. pulling out of _handle_rejects as well. I think we'll be getting rid of this rejected request business soon anyway

@tjohnson31415 tjohnson31415 merged commit 84e01fa into main Apr 8, 2025
7 checks passed
@tjohnson31415 tjohnson31415 deleted the fix-keyerror branch April 8, 2025 19:36
rafvasq pushed a commit to rafvasq/vllm-spyre that referenced this pull request Apr 8, 2025
* fix: add optional arg to abort_seq_group for compat with v0.8

Signed-off-by: Travis Johnson <[email protected]>

* fix: guard against KeyError with _req_ids2idx

Signed-off-by: Travis Johnson <[email protected]>

* fix: specialize finish_requests in V1 scheduler

Signed-off-by: Travis Johnson <[email protected]>

* fix: check against None...

Signed-off-by: Travis Johnson <[email protected]>

* refactor: make holdback queue use more temporary

Signed-off-by: Travis Johnson <[email protected]>

---------

Signed-off-by: Travis Johnson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request cancellation can cause the server to crash

2 participants