Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocm vllm ci fix (new design) #475

Merged
merged 21 commits into from
Mar 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ea787b0
Test build to check processing by different K8 queues.
Alexei-V-Ivanov-AMD Feb 4, 2025
01dfdda
Testing.
Alexei-V-Ivanov-AMD Feb 5, 2025
7f80bf8
Copying over the tests directory to enable CI testing.
Alexei-V-Ivanov-AMD Feb 5, 2025
14aaf35
Comparing with MI250 in the "mi250_8xGPU" queue.
Alexei-V-Ivanov-AMD Feb 5, 2025
a106489
Building with "test" as a --target
Alexei-V-Ivanov-AMD Feb 5, 2025
6acfc3a
Fixing working directory property.
Alexei-V-Ivanov-AMD Feb 5, 2025
172e0e8
Dummy alternation to confirm trouble with simultaneous test execution.
Alexei-V-Ivanov-AMD Feb 5, 2025
114e750
Dummy alternation to trigger a re-build and re-test.
Alexei-V-Ivanov-AMD Feb 6, 2025
0fc4050
Updating rocm dockerhub repo.
Alexei-V-Ivanov-AMD Feb 27, 2025
b2e3e12
Update run-amd-test.sh
Alexei-V-Ivanov-AMD Mar 3, 2025
cc41fa6
.
Alexei-V-Ivanov-AMD Mar 4, 2025
2c64618
Merge branch 'k8test' of github.com:ROCm/vllm into k8test
Alexei-V-Ivanov-AMD Mar 4, 2025
4022a8a
Importing Test improvements (Sage's PR #13970 to vllm-project).
Alexei-V-Ivanov-AMD Mar 4, 2025
84ea7b9
Restoring access to amd_gpu_1 queue
Alexei-V-Ivanov-AMD Mar 4, 2025
fbb39f3
Redirecting to the stable test-processing queues.
Alexei-V-Ivanov-AMD Mar 10, 2025
68c7701
Merge branch 'main' into rocm-vllm-ci-fix
Alexei-V-Ivanov-AMD Mar 10, 2025
e210fb7
Fix building architectures.
Alexei-V-Ivanov-AMD Mar 10, 2025
f591d18
Merge branch 'rocm-vllm-ci-fix' of github.com:ROCm/vllm into rocm-vll…
Alexei-V-Ivanov-AMD Mar 10, 2025
5e31d5c
Removing junk.
Alexei-V-Ivanov-AMD Mar 10, 2025
0a3e9e7
Merge branch 'main' into rocm-vllm-ci-fix-nd
Alexei-V-Ivanov-AMD Mar 12, 2025
48c916b
Merge branch 'main' into rocm-vllm-ci-fix-nd
Alexei-V-Ivanov-AMD Mar 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .buildkite/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ while true; do
done

echo "--- Pulling container"
image_name="rocm/vllm-ci:${BUILDKITE_COMMIT}"
image_name="rocm/vllm-ci-private:${BUILDKITE_COMMIT}"
container_name="rocm_${BUILDKITE_COMMIT}_$(tr -dc A-Za-z0-9 < /dev/urandom | head -c 10; echo)"
docker pull "${image_name}"

Expand Down
12 changes: 10 additions & 2 deletions .buildkite/test-template.j2
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
{% set docker_image = "public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT" %}
{% set docker_image_amd = "rocm/vllm-ci:$BUILDKITE_COMMIT" %}
{% set docker_image_amd = "rocm/vllm-ci-private:$BUILDKITE_COMMIT" %}
{% set default_working_dir = "vllm/tests" %}
{% set hf_home = "/root/.cache/huggingface" %}

steps:
- label: ":docker: build image"
depends_on: ~
commands:
- "docker build --build-arg max_jobs=16 --tag {{ docker_image_amd }} -f Dockerfile.rocm --target test --progress plain ."
- "docker build --build-arg max_jobs=16 --tag {{ docker_image_amd }} -f Dockerfile.rocm --build-arg ARG_PYTORCH_ROCM_ARCH='gfx90a;gfx942' --target test --progress plain ."
- "docker push {{ docker_image_amd }}"
key: "amd-build"
env:
Expand All @@ -27,7 +27,15 @@ steps:
depends_on:
- "amd-build"
agents:
{% if step.amd_gpus and step.amd_gpus==8%}
queue: amd_gpu
{% elif step.amd_gpus and step.amd_gpus==4%}
queue: amd_gpu
{% elif step.amd_gpus and step.amd_gpus==2%}
queue: amd_gpu
{% else%}
queue: amd_gpu
{% endif%}
commands:
- bash .buildkite/run-amd-test.sh "cd {{ (step.working_dir or default_working_dir) | safe }} ; {{ step.command or (step.commands | join(" && ")) | safe }}"
env:
Expand Down