-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multi-node & autoscaling & routing together for models like Deepseek-R1 #758
Comments
RoutingUpdate: after running more tests. I notice this is not true. I did see it comes to other pods, but due to some issues, the request didn't run through.
|
RayCluster Orchestration related
|
vLLM 0.7.3 problem
|
RDMA setupFrom the nccl logs, we can see that cross-node communication is happening over RDMA, while intra-node transfers fall back to IPC (NVLink in this case). ('NCCL INFO NVLS multicast support is available') RDMA(RoCE) logs
some warning messages
|
Thanks @Jeffwan. This is a great feature. One quick question, is the head pod concept referring to the Ray head node (the underlying implementation) or a broader context? |
@xieus it's specific to ray head. |
Autoscaling
need minor changes to filter out the worker nodes
|
🚀 Feature Description and Motivation
Orchestration
Autoscaling
In such cases, traditional autoscaling may not work well.
Routing
Use Case
As a user, I want to host deepseek-r1 full weights version and autoscale the workloads based on the traffic
Proposed Solution
No response
The text was updated successfully, but these errors were encountered: