You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Description
We found a couple of ways to crash the server with edge cases in top_k:
- setting `top_k > vocab_size` in the request
- mixing greedy requests and sampling requests with `top_k > 0` in the
same batch
See #542 for details on
the crashes.
The "fix" in this PR is to just copy the logic from the vLLM's GPU
`InputBatch` for setting the value of top_k: clamping the value to
`vocab_size` and setting the default top_k to `vocab_size` instead of 0
in a mixed batch.
REF:
https://github.com/vllm-project/vllm/blob/fc168c33f35e0610d41206e864b6bf90fe613f19/vllm/v1/worker/gpu_input_batch.py#L353-L357
## Related Issues
FIX#542
---------
Signed-off-by: Travis Johnson <[email protected]>
0 commit comments