0.7.4 开发版CPU使用率一直100%,即使没处理请求的时候也一样 #14786
Closed
AndrewTsao
announced in
General
Replies: 1 comment
-
我也发现,没有调用的时候cpu也是100%,不知道这是否正常 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
请问V1模式下,使用FLASHMLA开启状态,有8个进程CPU100%正常吗?
程序版本:
启动命令
VLLM_ATTENTION_BACKEND=FLASHMLA VLLM_USE_V1=1 OMP_NUM_THREADS=12 /opt/vllm-0.7.4-dev/bin/vllm serve DeepSeek-R1 --max-model-len 131072 --max-num-batched-tokens 8192 --enable-reasoning --reasoning-parser deepseek_r1 --api_key ${VLLM_API_KEY} --tensor-parallel-size 8 --trust-remote-code --disable-log-requests --enable-prefix-caching --enable-chunked-prefill --gpu_memory_utilization=0.95 -O3
硬件配置: 8 x H200
strace结果,
Beta Was this translation helpful? Give feedback.
All reactions