Support multi-node & autoscaling & routing together for models like Deepseek-R1 #758

Jeffwan · 2025-02-27T05:59:42Z

🚀 Feature Description and Motivation

Orchestration

Deepseek-r1 full weights needs to be deployed using multi-node orchestration. If we adopt cross node TP, Let's make sure we unblock RDMA communication in such case..
Let's make sure the rolling upgrade experiences are expected.
We also need graceful shutdown to make sure in-flight request can be handled correctly.

Autoscaling

In such cases, traditional autoscaling may not work well.

For resource metrics like SM_ACTIVE etc, it is still aggregated at the pod level and make no big differences.
For applications metrics, only head pod which has the apiserver deployed emit the metrics. it has to be consistent with the number of the units.

Routing

Router should skip some worker pods and only consider head pod for request touring
Make sure it remove the pod when it comes into terminating stage.

Use Case

As a user, I want to host deepseek-r1 full weights version and autoscale the workloads based on the traffic

Proposed Solution

No response

Jeffwan · 2025-03-01T10:06:44Z

Routing

always hit the head

Update: after running more tests. I notice this is not true. I did see it comes to other pods, but due to some issues, the request didn't run through.

python3 benchmark_serving.py --backend vllm  --model deepseek-ai/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --base-url http://localhost:8888 --endpoint /v1/completions --num-prompts 100 --request-rate 2 --metric_percentiles '50,90,95,99' --goodput ttft:1000 tpot:100 --max-concurrency 200 --random-input-len 2048 --random-output-len 200 --dataset-name random --ignore-eos

Jeffwan · 2025-03-02T17:53:11Z

RayCluster Orchestration related

ray.io/overwrite-container-cmd -> RayCluster level
header & worker annotations has to be set separately, there's no propogation to different roles yet. RayClusterFleet spec.templates.metadata controls RayCluster metadata.
Probe can be overrided by users. or disable injection

Jeffwan · 2025-03-02T18:18:55Z

vLLM 0.7.3 problem

hang for long time, I checked vllm-project/vllm#13136 and decide to rebuild the image

FROM vllm/vllm-openai:v0.7.3
RUN pip3 install -U ray[default,adag]==2.40.0 --progress-bar off # important for future healthcheck
RUN pip3 install -U nvidia-nccl-cu12 --progress-bar off
ENTRYPOINT [""]

Note: in 0.7.3, ray[adag] was used to replace ray[default]. this bring issues to kuberay based deployment because our injected prob uses agent to check healthy status. I considered to use v0.7.2 but notice 0.7.3 brings flashattentionv3 for MLA optimization, so I just stick to v0.7.3

Jeffwan · 2025-03-02T18:33:24Z

RDMA setup

From the nccl logs, we can see that cross-node communication is happening over RDMA, while intra-node transfers fall back to IPC (NVLink in this case). ('NCCL INFO NVLS multicast support is available')

RDMA(RoCE) logs

deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Bootstrap: Using eth0:192.168.0.90<0>
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO cudaDriverVersion 12020
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL version 2.25.1+cuda12.2
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin.
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL_IB_HCA set to mlx5_
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [1]mlx5_2:1/RoCE [2]mlx5_3:1/RoCE [3]mlx5_4:1/RoCE [4]mlx5_5:1/RoCE [5]mlx5_6:1/RoCE [6]mlx5_7:1/RoCE [7]mlx5_8:1/RoCE [RO]; OOB eth0:192.168.0.90<0>
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Using network IB
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO ncclCommInitRank comm 0xc764960 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId e000 commId 0xd0f99dd1affac83 - Init START
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO RAS client listening socket at ::1<28028>
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Bootstrap timings total 0.090936 (create 0.000030, send 0.000074, recv 0.000036, ring 0.030250, delay 0.000001)
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,00000000,0000ffff,ffffffff
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS multicast support is available on dev 0
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO comm 0xc764960 rank 0 nRanks 16 nNodes 2 localRanks 8 localRank 0 MNNVL 0
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head  0:  0  8
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head  1:  1  9
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head  2:  2 10
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head  3:  3 11
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head  4:  4 12
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head  5:  5 13
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head  6:  6 14
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head  7:  7 15
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 00/16 :  0  7  6  5  4  3  2  1  9 10 11 12 13 14 15  8
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 01/16 :  0  8 15 14 13 12 11 10  9  1  2  3  4  5  6  7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 02/16 :  0  7  6  5  4  3 11 12 13 14 15  8  9 10  2  1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 03/16 :  0  1  2 10  9  8 15 14 13 12 11  3  4  5  6  7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 04/16 :  0  7  6  5 13 14 15  8  9 10 11 12  4  3  2  1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 05/16 :  0  1  2  3  4 12 11 10  9  8 15 14 13  5  6  7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 06/16 :  0  7 15  8  9 10 11 12 13 14  6  5  4  3  2  1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 07/16 :  0  1  2  3  4  5  6 14 13 12 11 10  9  8 15  7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 08/16 :  0  7  6  5  4  3  2  1  9 10 11 12 13 14 15  8
dee�[36m(RayWorkerWrapper pid=996)�[0m INFO 03-02 10:21:47 utils.py:916] Found nccl from library libnccl.so.2
�[36m(RayWorkerWrapper pid=996)�[0m INFO 03-02 10:21:47 pynccl.py:69] vLLM is using nccl==2.25.1
�[36m(RayWorkerWrapper pid=342, ip=192.168.0.83)�[0m INFO 03-02 10:21:42 __init__.py:207] Automatically detected platform cuda.�[32m [repeated 7x across cluster]�[0m
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO cudaDriverVersion 12020
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Bootstrap: Using eth0:192.168.0.83<0>
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL version 2.25.1+cuda12.2
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin.
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL_IB_HCA set to mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [1]mlx5_2:1/RoCE [2]mlx5_3:1/RoCE [3]mlx5_4:1/RoCE [4]mlx5_5:1/RoCE [5]mlx5_6:1/RoCE [6]mlx5_7:1/RoCE [7]mlx5_8:1/RoCE [RO]; OOB eth0:192.168.0.83<0>
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Using network IB
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO ncclCommInitRank comm 0xde1dae0 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 44000 commId 0xd0f99dd1affac83 - Init START
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO RAS client listening socket at ::1<28028>
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Bootstrap timings total 0.006130 (create 0.000024, send 0.000165, recv 0.000208, ring 0.001345, delay 0.000000)
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffffff,00000000,0000ffff,ffffffff
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NVLS multicast support is available on dev 1
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO comm 0xde1dae0 rank 9 nRanks 16 nNodes 2 localRanks 8 localRank 1 MNNVL 0
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Trees [0] 10/-1/-1->9->8 [1] 10/-1/-1->9->1 [2] -1/-1/-1->9->8 [3] 10/-1/-1->9->8 [4] 10/-1/-1->9->8 [5] 10/-1/-1->9->8 [6] 10/-1/-1->9->8 [7] 10/-1/-1->9->8 [8] 10/-1/-1->9->8 [9] 10/1/-1->9->-1 [10] -1/-1/-1->9->8 [11] 10/-1/-1->9->8 [12] 10/-1/-1->9->8 [13] 10/-1/-1->9->8 [14] 10/-1/-1->9->8 [15] 10/-1/-1->9->8
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO P2P Chunksize set to 131072
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:1377 [1] NCCL INFO [Proxy Service] Device 1 CPU core 40
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:1381 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 41
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 00/0 : 9[1] -> 10[2] via P2P/IPC
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 02/0 : 9[1] -> 10[2] via P2P/IPC
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 04/0 : 9[1] -> 10[2] via P2P/IPC
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 06/0 : 9[1] -> 10[2] via P2P/IPC
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 08/0 : 9[1] -> 10[2] via P2P/IPC
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 10/0 : 9[1] -> 10[2] via P2P/IPC
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worke
�[36m(RayWorkerWrapper pid=335, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-w
�[36m(RayWorkerWrapper pid=338, ip=192.168.0.83)�[0m d
�[36m(RayWorkerWrapper pid=341, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:341:341 [6] NCCL INFO Channel 12/0 : 14[6] -> 15
�[36m(RayWorkerWrapper pid=996)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head  0:  0  8
�[36m(RayWorkerWrapper pid=996)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head  1:  1  9
�[36m(RayWorkerWrapper pid=996)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head  2:  2 10
�[36m(RayWorkerWrapper pid=996)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head  3:  3 11
�[36m(RayWorkerWrapper pid=996)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head  4:  4 12
�[36m(RayWorkerWrapper pid=996)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head  5:  5 13
�[36m(RayWorkerWrapper pid=996)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head  6:  6 14
�[36m(RayWorkerWrapper pid=996)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head  7:  7 15
�[36m(RayWorkerWrapper pid=996)�[0m deep
�[36m(RayWorkerWrapper pid=1015)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:1015:21258 [7] NCCL INFO [Proxy Progress] Device 7 CPU core 93
�[36m(RayWorkerWrapper pid=1015)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:1015:1015 [7] NCCL INFO Channel 07/0 : 15[7] -> 7[7] [receive] via NET/IB/15/GDRDMA
�[36m(RayWorkerWrapper pid=1015)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:1015:1015 [7] NCCL INFO Channel 15/0 : 15[7] -> 7[7] [receive] via NET/IB/1
�[36m(RayWorkerWrapper pid=983)�[0m deeps
�[36m(RayWorkerWrapper pid=1005)�[0m deepseek-r1-671b-88957849-q6slh
�[36m(RayWorkerWrapper pid=987)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:987:987 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] v
�[36m(RayWorkerWrapper pid=337, ip=192.168.0.83)�[0m deepseek-r1-67
�[36m(RayWorkerWrapper pid=340, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:340:340 [7] NCCL INFO Channel 07/0 : 15[7] -> 7[7] [send] via NET/IB/15/GDRDMApseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 09/16 :  0  8 15 14 13 12 11 10  9  1  2  3  4  5  6  7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 10/16 :  0  7  6  5  4  3 11 12 13 14 15  8  9 10  2  1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 11/16 :  0  1  2 10  9  8 15 14 13 12 11  3  4  5  6  7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 12/16 :  0  7  6  5 13 14 15  8  9 10 11 12  4  3  2  1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 13/16 :  0  1  2  3  4 12 11 10  9  8 15 14 13  5  6  7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 14/16 :  0  7 15  8  9 10 11 12 13 14  6  5  4  3  2  1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 15/16 :  0  1  2  3  4  5  6 14 13 12 11 10  9  8 15  7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] -1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->7 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 [8] 1/-1/-1->0->8 [9] -1/-1/-1->0->7 [10] 1/-1/-1->0->7 [11] 1/-1/-1->0->7 [12] 1/-1/-1->0->7 [13] 1/-1/-1->0->7 [14] 1/-1/-1->0->7 [15] 1/-1/-1->0->7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO P2P Chunksize set to 131072
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Check P2P Type intraNodeP2pSupport 1 directMode 0
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:21242 [0] NCCL INFO [Proxy Service] Device 0 CPU core 31
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:21249 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 32
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 00/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 02/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 04/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 06/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 08/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 10/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 12/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 14/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:21256 [0] NCCL INFO [Proxy Progress] Device 0 CPU core 129
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 08/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 01/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 09/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 01/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 03/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 05/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 07/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 09/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 11/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 13/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 15/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 08/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Connected all trees
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS comm 0xc764960 headRank 0 nHeads 8 buffSize 1048576 nvlsPerRankSize 33554432 nvlsTotalSize 268435456
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 01/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 02/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 03/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 05/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 06/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 07/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 09/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 10/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 11/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 12/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 13/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 14/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 15/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 02/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 03/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 05/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 06/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 07/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 10/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 11/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 12/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 13/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 14/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 15/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Connected NVLS tree
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO 16 coll channels, 16 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO CC Off, workFifoBytes 1048576
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin.
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO ncclCommInitRank comm 0xc764960 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId e000 commId 0xd0f99dd1affac83 - Init COMPLETE
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Init timings - ncclCommInitRank: rank 0 nranks 16 total 3.08 (kernels 0.36, alloc 0.89, bootstrap 0.09, allgathers 0.01, topo 0.53, graphs 0.01, connections 1.18, rest 0.00)

�[36m(RayWorkerWrapper pid=340, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:340:340 [7] NCCL INFO Channel 15/0 : 15[7] -> 7[7] [send] via NET/IB/15/GDRDMA
�[36m(RayWorkerWrapper pid=340, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhsk
�[36m(RayWorkerWrapper pid=338, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:338:338 [3] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
�[36m(RayWorkerWrapper pid=338, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:338:338 [3] NCCL INFO Connected all t
�[36m(RayWorkerWrapper pid=342, ip=192.168.0.83)�[0m 6] via P2P/IPC
�[36m(RayWorkerWrapper pid=342, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:342:342 [5] NCCL IN
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Connected all trees
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO 
�[36m(RayWorkerWrapper pid=340, ip=192.168.0.83)�[0m deepse
�[36m(RayWorkerWrapper pid=996)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS comm 0xbb0e900 headRank 3 nHeads 8 buffSize 1048576 nvlsPerRankSize 33554432 nvlsTotalSize 268435456
�[36m(RayWorkerWrapper pid=981)�[0m deepseek-r1-671b-88957
�[36m(RayWorkerWrapper pid=993)�[0m deepseek-r1-671b-88957849-q6slh-hea
�[36m(RayWorkerWrapper pid=1015)�[0m 5/GDRDMA
�[36m(RayWorkerWrapper pid=1015)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:1015:1015 [7] NCCL INFO Channel 01/0 : 15[7] -> 7[7] [re
�[36m(RayWorkerWrapper pid=983)�[0m deepseek-r1-671b-
�[36m(RayWorkerWrapper pid=1005)�[0m deepseek-r1-6
�[36m(RayWorkerWrapper pid=987)�[0m ia P2P/IPC
�[36m(RayWorkerWrapper pid=987)�[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:987:987 [5] NCCL INFO Channel 02/0 : 13[5] -> 5[5] [receive] via N
�[36m(RayWorkerWrapper pid=335, ip=192.168.0.83)�[0m de
�[36m(RayWorkerWrapper pid=341, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-wo
�[36m(RayWorkerWrapper pid=339, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-wor
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m NVLS comm 0xde1dae0 headRank 1 nHeads 8 buffSize 1048576 nvlsPerRankSize 33554432 nvlsTotalSize 268435456
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Connected NVLS tree
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO 16 coll channels, 16 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin.
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO ncclCommInitRank comm 0xde1dae0 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 44000 commId 0xd0f99dd1affac83 - Init COMPLETE
�[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Init timings - ncclCommInitRank: rank 9 nranks 16 total 2.94 (kernels 0.29, alloc 1.03, bootstrap 0.01, allgathers 0.01, topo 0.54, graphs 0.01, connections 1.06, rest 0.00)
�[36m(RayWorkerWrapper pid=337, ip=192.168.0.83)�[0m Channel 00/0 : 2[2] -> 10[2] [receive] via NET/IB/10/GDRDMA
�[36m(RayWorkerWrapper pid=335, ip=192.168.0.83)�[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:335:335 [0] NCCL INFO ncclCommInitRank comm 0xcf3b380 r
�[36m(RayWorkerWrapper pid=338, ip=192.168.0.83)�[0m rees
�[36m(RayWorkerWrapper pid=342, ip=192.168.0.83)�[0m FO Connected all trees
WARNING 03-02 10:21:50 custom_all_reduce.py:84] Custom allreduce is disabled because this process group spans across nodes.
INFO 03-02 10:21:50 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='192.168.0.90', local_reader_ranks=[1, 2, 3, 4, 5, 6, 7], buffer_handle=(7, 4194304, 6, 'psm_1ee0df8a'), local_subscribe_port=60107, remote_subscribe_port=49929)
�[36m(RayWorkerWrapper pid=996)�[0m WARNING 03-02 10:21:50 custom_all_reduce.py:84] Custom allreduce is disabled because this process group spans across nodes.
�[36m(RayWorkerWrapper pid=1015)�[0m ceive] via NET/IB/15/GDRDMA
�[36m(RayWorkerWrapper pid=987)�[0m ET/IB/13/GDRDMA
�[36m(RayWorkerWrapper pid=342, ip=192.168.0.83)�[0m INFO 03-02 10:21:44 cuda.py:160] Using Triton MLA backend.�[32m [repeated 14x across cluster]�[0m
�[36m(RayWorkerWrapper pid=335, ip=192.168.0.83)�[0m ank 8 nranks 16 cudaDev 0 nvmlDev 0 busId e000 commId 0xd0f99dd1affac83 - Init COMPLETE

some warning messages

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f252bc00000 sendbuffSize 2097152 recvbuff 0x7f282cc00000 recvbuffSize 2097152

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f282cc00000 sendbuffSize 2097152 recvbuff 0x7f282cc00000 recvbuffSize 2097152

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f282cc00000 sendbuffSize 2097152 recvbuff 0x7f252bc00000 recvbuffSize 2097152

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f282cc00000 sendbuffSize 2097152 recvbuff 0x7f24dbc00000 recvbuffSize 2097152

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f252bc00000 sendbuffSize 2097152 recvbuff 0x7f282cc00000 recvbuffSize 2097152

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'

deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f282cc00000 sendbuffSize 2097152 recvbuff 0x7f282cc00000 recvbuffSize 2097152

xieus · 2025-03-02T18:57:27Z

For applications metrics, only head pod which has the apiserver deployed emit the metrics. it has to be consistent with the number of the units.

Thanks @Jeffwan. This is a great feature. One quick question, is the head pod concept referring to the Ray head node (the underlying implementation) or a broader context?

Jeffwan · 2025-03-03T01:00:14Z

@xieus it's specific to ray head.

Jeffwan · 2025-03-03T01:01:12Z

Autoscaling

NAME                                                          READY   STATUS              RESTARTS   AGE     IP             NODE           NOMINATED NODE   READINESS GATES
deepseek-r1-671b-56f9654bbb-mgdwd-head-lf5xg                  1/1     Running             0          27m     192.168.0.74   192.168.0.51   <none>           <none>
deepseek-r1-671b-56f9654bbb-mgdwd-worker-group-worker-pb4hh   1/1     Running             0          27m     192.168.0.81   192.168.0.52   <none>           <none>

need minor changes to filter out the worker nodes

E0303 01:10:33.242360       1 kpa.go:256] Failed to get stable and panic metrics for default/deepseek-r1-671b: no data available
E0303 01:10:33.249115       1 controller.go:329] "msg"="Reconciler error" "error"="failed to compute desired number of replicas based on listed metrics for RayClusterFleet/default/deepseek-r1-671b: can not calculate metrics for scale deepseek-r1-671b" "PodAutoscaler"={"name":"deepseek-r1-671b-autoscaling","namespace":"default"} "controller"="podautoscaler" "controllerGroup"="autoscaling.aibrix.ai" "controllerKind"="PodAutoscaler" "name"="deepseek-r1-671b-autoscaling" "namespace"="default" "reconcileID"="432ed9d8-f944-47f8-9975-047731c77ebf"
E0303 01:13:33.242425       1 controller.go:329] "msg"="Reconciler error" "error"="failed to update metrics for scale target reference: failed to fetch metrics from source http://192.168.0.84:8000/metrics: Get \"http://192.168.0.84:8000/metrics\": dial tcp 192.168.0.84:8000: connect: connection refused" "PodAutoscaler"={"name":"deepseek-r1-671b-autoscaling","namespace":"default"} "controller"="podautoscaler" "controllerGroup"="autoscaling.aibrix.ai" "controllerKind"="PodAutoscaler" "name"="deepseek-r1-671b-autoscaling" "namespace"="default" "reconcileID"="d308d1c6-432f-491b-b192-33619c952e3a"

Jeffwan added area/autoscaling area/distributed labels Feb 27, 2025

Jeffwan changed the title ~~Support multi-node & autoscaling together for models like Deepseek-R1~~ Support multi-node & autoscaling & routing together for models like Deepseek-R1 Feb 27, 2025

Jeffwan self-assigned this Feb 27, 2025

Jeffwan added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Feb 27, 2025

Jeffwan assigned varungup90 Mar 3, 2025

This was referenced Mar 4, 2025

Append ray head label selector in PodAutoscaler #789

Merged

Ignore worker pods for gateway routing #776

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-node & autoscaling & routing together for models like Deepseek-R1 #758

Support multi-node & autoscaling & routing together for models like Deepseek-R1 #758

Jeffwan commented Feb 27, 2025 •

edited

Loading

Jeffwan commented Mar 1, 2025 •

edited

Loading

Jeffwan commented Mar 2, 2025

Jeffwan commented Mar 2, 2025

Jeffwan commented Mar 2, 2025 •

edited

Loading

xieus commented Mar 2, 2025

Jeffwan commented Mar 3, 2025

Jeffwan commented Mar 3, 2025 •

edited

Loading

Support multi-node & autoscaling & routing together for models like Deepseek-R1 #758

Support multi-node & autoscaling & routing together for models like Deepseek-R1 #758

Comments

Jeffwan commented Feb 27, 2025 • edited Loading

🚀 Feature Description and Motivation

Orchestration

Autoscaling

Routing

Use Case

Proposed Solution

Jeffwan commented Mar 1, 2025 • edited Loading

Routing

Jeffwan commented Mar 2, 2025

RayCluster Orchestration related

Jeffwan commented Mar 2, 2025

vLLM 0.7.3 problem

Jeffwan commented Mar 2, 2025 • edited Loading

RDMA setup

xieus commented Mar 2, 2025

Jeffwan commented Mar 3, 2025

Jeffwan commented Mar 3, 2025 • edited Loading

Autoscaling

Jeffwan commented Feb 27, 2025 •

edited

Loading

Jeffwan commented Mar 1, 2025 •

edited

Loading

Jeffwan commented Mar 2, 2025 •

edited

Loading

Jeffwan commented Mar 3, 2025 •

edited

Loading