-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: moe_ck2stages Mixtral TP8 fails #257
Comments
I'm sorry I can't reproduce your problem. Your problem seems to be that the CK 2Stage MoE JIT build is failed, maybe you can try running this test to check if CK 2Stage MoE works. If it works. you can run your CMD to launch vllm benchmark again. |
That test seems to pass and then running tp 8 in the same container works fines. However, I get this error when i start a new container and run TP8 as the first command. I'm seeing this on multiple systems, multiple different input/output sizes. What is the exit status of the command below for you? docker run -it \
--ipc=host \
--network=host \
--privileged \
--cap-add=CAP_SYS_ADMIN \
--device=/dev/kfd \
--device=/dev/dri \
--device=/dev/mem \
--group-add render \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-v /data:/data \
-e HF_HOME=/data/huggingface-cache \
-e HF_TOKEN=<TOKEN> \
-e VLLM_USE_TRITON_FLASH_ATTN=0 \
-e VLLM_USE_AITER=1 \
rocm/vllm-dev:nightly_aiter_integration_final_20250325 \
python /app/vllm/benchmarks/profiling/benchmark_latency.py \
--model amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV \
--quantization fp8 \
--kv-cache-dtype fp8 \
--dtype auto \
--gpu-memory-utilization 0.92 \
--num-scheduler-steps 10 \
--max-model-len 8192 \
--distributed-executor-backend mp \
--tensor-parallel-size 8 \
--input-len 128 \
--output-len 128
echo $?
135 docker run -it \
--ipc=host \
--network=host \
--privileged \
--cap-add=CAP_SYS_ADMIN \
--device=/dev/kfd \
--device=/dev/dri \
--device=/dev/mem \
--group-add render \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-v /data:/data \
-e HF_HOME=/data/huggingface-cache \
-e HF_TOKEN=<TOKEN> \
-e VLLM_USE_TRITON_FLASH_ATTN=0 \
-e VLLM_USE_AITER=0 \
rocm/vllm-dev:nightly_aiter_integration_final_20250325 \
python /app/vllm/benchmarks/profiling/benchmark_latency.py \
--model amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV \
--quantization fp8 \
--kv-cache-dtype fp8 \
--dtype auto \
--gpu-memory-utilization 0.92 \
--num-scheduler-steps 10 \
--max-model-len 8192 \
--distributed-executor-backend mp \
--tensor-parallel-size 8 \
--input-len 128 \
--output-len 128
echo $?
0 |
This problem can be caused by multiple processes triggering JIT compilation. To fix this, we created a new branch (https://github.com/ROCm/aiter/tree/jit_update). You can replace AITER in container with this branch and run your command. |
Problem Description
Running Mixtral 8x7B/8x22B TP8 fails using AITER, disabling 2Stage MoE works, as does running TP1 first and then TP8. Fails at
start build [module_moe_ck2stages] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_moe_ck2stages
Error:
Example Commands:
Sys Info:
OS:
NAME="Ubuntu"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
CPU:
model name : AMD EPYC 9575F 64-Core Processor
Operating System
Ubuntu "22.04.5 LTS (Jammy Jellyfish)"
CPU
AMD EPYC 9575F 64-Core Processor
GPU
MI300X
ROCm Version
ROCm 6.3.1
ROCm Component
No response
Steps to Reproduce
FAILS
docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add render --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /data:/data -e HF_HOME=/data/huggingface-cache -e VLLM_USE_TRITON_FLASH_ATTN=0 -e VLLM_USE_AITER=1 rocm/vllm-dev:nightly_aiter_integration_final_20250325
python /app/vllm/benchmarks/profiling/benchmark_latency.py
--model amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV
--dtype auto
--gpu-memory-utilization 0.92
--num-scheduler-steps 1
--max-model-len 8192
--distributed-executor-backend mp
--tensor-parallel-size 8
--input-len 128
--output-len 128
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: