Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: moe_ck2stages Mixtral TP8 fails #257

Open
arakowsk-amd opened this issue Mar 31, 2025 · 3 comments
Open

[Issue]: moe_ck2stages Mixtral TP8 fails #257

arakowsk-amd opened this issue Mar 31, 2025 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@arakowsk-amd
Copy link

arakowsk-amd commented Mar 31, 2025

Problem Description

Running Mixtral 8x7B/8x22B TP8 fails using AITER, disabling 2Stage MoE works, as does running TP1 first and then TP8. Fails at start build [module_moe_ck2stages] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_moe_ck2stages

Error:

start build [module_moe_ck2stages] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_moe_ck2stages
(VllmWorkerProcess pid=290) failed build jit [module_moe_ck2stages]
(VllmWorkerProcess pid=290) -->[History]: Traceback (most recent call last):
(VllmWorkerProcess pid=290) -->  File "/usr/local/lib/python3.12/dist-packages/aiter/jit/core.py", line 322, in wrapper
(VllmWorkerProcess pid=290)     module = get_module(custom_build_args.get('md_name',
(VllmWorkerProcess pid=290)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=290) -->  File "/usr/local/lib/python3.12/dist-packages/aiter/jit/core.py", line 130, in get_module
(VllmWorkerProcess pid=290)     return importlib.import_module(f'{__package__}.{md_name}')
(VllmWorkerProcess pid=290)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=290) -->  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
(VllmWorkerProcess pid=290)     return _bootstrap._gcd_import(name[level:], package, level)
(VllmWorkerProcess pid=290)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=290) -->  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
(VllmWorkerProcess pid=290) -->  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
(VllmWorkerProcess pid=290) -->  File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked
(VllmWorkerProcess pid=290) -->ModuleNotFoundError: No module named 'aiter.jit.module_moe_ck2stages'
(VllmWorkerProcess pid=290) -->
(VllmWorkerProcess pid=290) During handling of the above exception, another exception occurred:
(VllmWorkerProcess pid=290)
(VllmWorkerProcess pid=290) -->Traceback (most recent call last):
(VllmWorkerProcess pid=290) -->  File "/usr/local/lib/python3.12/dist-packages/aiter/jit/core.py", line 228, in build_module
(VllmWorkerProcess pid=290)     shutil.copy(f'{opbd_dir}/{md_name}.so', f'{this_dir}')
(VllmWorkerProcess pid=290) -->  File "/usr/lib/python3.12/shutil.py", line 436, in copy
(VllmWorkerProcess pid=290)     copymode(src, dst, follow_symlinks=follow_symlinks)
(VllmWorkerProcess pid=290) -->  File "/usr/lib/python3.12/shutil.py", line 317, in copymode
(VllmWorkerProcess pid=290)     chmod_func(dst, stat.S_IMODE(st.st_mode))
(VllmWorkerProcess pid=290) -->FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.12/dist-packages/aiter/jit/module_moe_ck2stages.so'

Example Commands:

# FAILS
docker run -it     --ipc=host     --network=host     --privileged     --cap-add=CAP_SYS_ADMIN     --device=/dev/kfd     --device=/dev/dri     --device=/dev/mem     --group-add render     --cap-add=SYS_PTRACE     --security-opt seccomp=unconfined     -v /data:/data     -e HF_HOME=/data/huggingface-cache     -e VLLM_USE_TRITON_FLASH_ATTN=0    -e VLLM_USE_AITER=1     rocm/vllm-dev:nightly_aiter_integration_final_20250325
 
python /app/vllm/benchmarks/profiling/benchmark_latency.py \
--model amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV \
--dtype auto \
--gpu-memory-utilization 0.92 \
--num-scheduler-steps 1 \
--max-model-len 8192 \
--distributed-executor-backend mp \
--tensor-parallel-size 8 \
--input-len 128 \
--output-len 128


#Disabling 2Stage MoE works 
docker run -it     --ipc=host     --network=host     --privileged     --cap-add=CAP_SYS_ADMIN     --device=/dev/kfd     --device=/dev/dri     --device=/dev/mem     --group-add render     --cap-add=SYS_PTRACE     --security-opt seccomp=unconfined     -v /data:/data     -e HF_HOME=/data/huggingface-cache     -e VLLM_USE_TRITON_FLASH_ATTN=0    -e VLLM_USE_AITER=1     rocm/vllm-dev:nightly_aiter_integration_final_20250325
 
VLLM_USE_AITER_2STAGE_MOE=0  python /app/vllm/benchmarks/profiling/benchmark_latency.py \
--model amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV \
--dtype auto \
--gpu-memory-utilization 0.92 \
--num-scheduler-steps 1 \
--max-model-len 8192 \
--distributed-executor-backend mp \
--tensor-parallel-size 8 \
--input-len 128 \
--output-len 128



# works when running TP1 first and then TP8 in the same container 
docker run -it     --ipc=host     --network=host     --privileged     --cap-add=CAP_SYS_ADMIN     --device=/dev/kfd     --device=/dev/dri     --device=/dev/mem     --group-add render     --cap-add=SYS_PTRACE     --security-opt seccomp=unconfined     -v /data:/data     -e HF_HOME=/data/huggingface-cache     - -e VLLM_USE_TRITON_FLASH_ATTN=0    -e VLLM_USE_AITER=1     rocm/vllm-dev:nightly_aiter_integration_final_20250325
 
python /app/vllm/benchmarks/profiling/benchmark_latency.py \
--model amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV \
--dtype auto \
--gpu-memory-utilization 0.92 \
--num-scheduler-steps 1 \
--max-model-len 8192 \
--distributed-executor-backend mp \
--tensor-parallel-size 1 \
--input-len 128 \
--output-len 128
# start build [module_moe_ck2stages] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_moe_ck2stages
# Completes without error
 
# now TP8 Works
python /app/vllm/benchmarks/profiling/benchmark_latency.py \
--model amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV \
--dtype auto \
--gpu-memory-utilization 0.92 \
--num-scheduler-steps 1 \
--max-model-len 8192 \
--distributed-executor-backend mp \
--tensor-parallel-size 8 \
--input-len 128 \
--output-len 128

Sys Info:
OS:
NAME="Ubuntu"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
CPU:
model name : AMD EPYC 9575F 64-Core Processor

Operating System

Ubuntu "22.04.5 LTS (Jammy Jellyfish)"

CPU

AMD EPYC 9575F 64-Core Processor

GPU

MI300X

ROCm Version

ROCm 6.3.1

ROCm Component

No response

Steps to Reproduce

FAILS

docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add render --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /data:/data -e HF_HOME=/data/huggingface-cache -e VLLM_USE_TRITON_FLASH_ATTN=0 -e VLLM_USE_AITER=1 rocm/vllm-dev:nightly_aiter_integration_final_20250325

python /app/vllm/benchmarks/profiling/benchmark_latency.py
--model amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV
--dtype auto
--gpu-memory-utilization 0.92
--num-scheduler-steps 1
--max-model-len 8192
--distributed-executor-backend mp
--tensor-parallel-size 8
--input-len 128
--output-len 128

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@arakowsk-amd arakowsk-amd added the bug Something isn't working label Mar 31, 2025
@junhaha666
Copy link
Contributor

I'm sorry I can't reproduce your problem. Your problem seems to be that the CK 2Stage MoE JIT build is failed, maybe you can try running this test to check if CK 2Stage MoE works. If it works. you can run your CMD to launch vllm benchmark again.
test: https://github.com/ROCm/aiter/blob/main/op_tests/test_moe_2stage.py

@arakowsk-amd
Copy link
Author

That test seems to pass and then running tp 8 in the same container works fines. However, I get this error when i start a new container and run TP8 as the first command. I'm seeing this on multiple systems, multiple different input/output sizes. What is the exit status of the command below for you?

docker run -it \
    --ipc=host \
    --network=host \
    --privileged \
    --cap-add=CAP_SYS_ADMIN \
    --device=/dev/kfd \
    --device=/dev/dri \
    --device=/dev/mem \
    --group-add render \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -v /data:/data \
    -e HF_HOME=/data/huggingface-cache \
    -e HF_TOKEN=<TOKEN> \
    -e VLLM_USE_TRITON_FLASH_ATTN=0 \
    -e VLLM_USE_AITER=1 \
    rocm/vllm-dev:nightly_aiter_integration_final_20250325 \
    python /app/vllm/benchmarks/profiling/benchmark_latency.py \
    --model amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV \
    --quantization fp8 \
    --kv-cache-dtype fp8 \
    --dtype auto \
    --gpu-memory-utilization 0.92 \
    --num-scheduler-steps 10 \
    --max-model-len 8192 \
    --distributed-executor-backend mp \
    --tensor-parallel-size 8 \
    --input-len 128 \
    --output-len 128
echo $?
135
docker run -it \
    --ipc=host \
    --network=host \
    --privileged \
    --cap-add=CAP_SYS_ADMIN \
    --device=/dev/kfd \
    --device=/dev/dri \
    --device=/dev/mem \
    --group-add render \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -v /data:/data \
    -e HF_HOME=/data/huggingface-cache \
    -e HF_TOKEN=<TOKEN> \
    -e VLLM_USE_TRITON_FLASH_ATTN=0 \
    -e VLLM_USE_AITER=0 \
    rocm/vllm-dev:nightly_aiter_integration_final_20250325 \
    python /app/vllm/benchmarks/profiling/benchmark_latency.py \
    --model amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV \
    --quantization fp8 \
    --kv-cache-dtype fp8 \
    --dtype auto \
    --gpu-memory-utilization 0.92 \
    --num-scheduler-steps 10 \
    --max-model-len 8192 \
    --distributed-executor-backend mp \
    --tensor-parallel-size 8 \
    --input-len 128 \
    --output-len 128
echo $?
0

@junhaha666
Copy link
Contributor

This problem can be caused by multiple processes triggering JIT compilation. To fix this, we created a new branch (https://github.com/ROCm/aiter/tree/jit_update). You can replace AITER in container with this branch and run your command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants