-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add cuda 12.8 wheel nightly build #18726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
atalman
wants to merge
8
commits into
vllm-project:main
Choose a base branch
from
atalman:try_nightly_wheel_buil
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
76d1425
Add cuda 12.8 wheel nightly build
atalman 9c9e630
disable
atalman 1249787
tests
atalman 0c42de3
fix
atalman 7fdb2bb
test
atalman 732dd00
fix
atalman 3975fdc
fixes
atalman 8151c2e
fix
atalman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
# The vLLM Dockerfile is used to construct vLLM image that can be directly used | ||
# to run the OpenAI compatible server. | ||
|
||
# Please update any changes made here to | ||
# docs/contributing/dockerfile/dockerfile.md and | ||
# docs/assets/contributing/dockerfile-stages-dependency.png | ||
|
||
ARG CUDA_VERSION=12.8.1 | ||
#################### BASE BUILD IMAGE #################### | ||
# prepare basic build environment | ||
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu20.04 AS base | ||
ARG CUDA_VERSION=12.8.1 | ||
ARG PYTHON_VERSION=3.12 | ||
ARG TARGETPLATFORM | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
|
||
# Install Python and other dependencies | ||
RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections \ | ||
&& echo 'tzdata tzdata/Zones/America select Los_Angeles' | debconf-set-selections \ | ||
&& apt-get update -y \ | ||
&& apt-get install -y ccache software-properties-common git curl sudo \ | ||
&& for i in 1 2 3; do \ | ||
add-apt-repository -y ppa:deadsnakes/ppa && break || \ | ||
{ echo "Attempt $i failed, retrying in 5s..."; sleep 5; }; \ | ||
done \ | ||
&& apt-get update -y \ | ||
&& apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python${PYTHON_VERSION}-venv \ | ||
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 1 \ | ||
&& update-alternatives --set python3 /usr/bin/python${PYTHON_VERSION} \ | ||
&& ln -sf /usr/bin/python${PYTHON_VERSION}-config /usr/bin/python3-config \ | ||
&& curl -sS https://bootstrap.pypa.io/get-pip.py | python${PYTHON_VERSION} \ | ||
&& python3 --version && python3 -m pip --version | ||
# Install uv for faster pip installs | ||
RUN --mount=type=cache,target=/root/.cache/uv \ | ||
python3 -m pip install uv | ||
|
||
# This timeout (in seconds) is necessary when installing some dependencies via uv since it's likely to time out | ||
# Reference: https://github.com/astral-sh/uv/pull/1694 | ||
ENV UV_HTTP_TIMEOUT=500 | ||
ENV UV_INDEX_STRATEGY="unsafe-best-match" | ||
|
||
# Upgrade to GCC 10 to avoid https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92519 | ||
# as it was causing spam when compiling the CUTLASS kernels | ||
RUN apt-get install -y gcc-10 g++-10 | ||
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 110 --slave /usr/bin/g++ g++ /usr/bin/g++-10 | ||
RUN <<EOF | ||
gcc --version | ||
EOF | ||
|
||
# Workaround for https://github.com/openai/triton/issues/2507 and | ||
# https://github.com/pytorch/pytorch/issues/107960 -- hopefully | ||
# this won't be needed for future versions of this docker image | ||
# or future versions of triton. | ||
RUN ldconfig /usr/local/cuda-$(echo $CUDA_VERSION | cut -d. -f1,2)/compat/ | ||
|
||
WORKDIR /workspace | ||
|
||
# install build and runtime dependencies | ||
|
||
# arm64 (GH200) build follows the practice of "use existing pytorch" build, | ||
# we need to install torch and torchvision from the nightly builds first, | ||
# pytorch will not appear as a vLLM dependency in all of the following steps | ||
# after this step | ||
RUN --mount=type=cache,target=/root/.cache/uv \ | ||
if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \ | ||
uv pip install --system --index-url https://download.pytorch.org/whl/nightly/cu128 "torch==2.8.0.dev20250318+cu128" "torchvision==0.22.0.dev20250319"; \ | ||
uv pip install --system --index-url https://download.pytorch.org/whl/nightly/cu128 --pre pytorch_triton==3.3.0+gitab727c40; \ | ||
fi | ||
|
||
|
||
# must put before installing xformers, so it can install the correct version of xfomrers. | ||
ARG torch_cuda_arch_list='8.0;8.6;8.9;9.0' | ||
ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list} | ||
|
||
# Build xformers with cuda and torch nightly | ||
# following official xformers guidance: https://github.com/facebookresearch/xformers#build | ||
ARG max_jobs=16 | ||
ENV MAX_JOBS=${max_jobs} | ||
ARG XFORMERS_COMMIT=da84ce3a8fc07e8b1fa0de5fce08c87bf7e713df | ||
|
||
COPY requirements/common.txt requirements/common.txt | ||
RUN --mount=type=cache,target=/root/.cache/uv \ | ||
uv pip install --system numba==0.61.2 | ||
|
||
RUN --mount=type=cache,target=/root/.cache/uv \ | ||
uv pip install --system -r requirements/common.txt | ||
|
||
ENV CCACHE_DIR=/root/.cache/ccache | ||
RUN --mount=type=cache,target=/root/.cache/ccache \ | ||
--mount=type=cache,target=/root/.cache/uv \ | ||
echo 'git clone xformers...' \ | ||
&& git clone https://github.com/facebookresearch/xformers.git --recursive \ | ||
&& cd xformers \ | ||
&& git checkout ${XFORMERS_COMMIT} \ | ||
&& git submodule update --init --recursive \ | ||
&& echo 'finish git clone xformers...' \ | ||
&& rm -rf build \ | ||
&& python3 setup.py bdist_wheel --dist-dir=../xformers-dist --verbose \ | ||
&& cd .. \ | ||
&& rm -rf xformers | ||
|
||
RUN --mount=type=cache,target=/root/.cache/uv \ | ||
uv pip install --system xformers-dist/*.whl --verbose | ||
|
||
|
||
COPY requirements/cuda-nightly.txt requirements/cuda-nightly.txt | ||
RUN --mount=type=cache,target=/root/.cache/uv \ | ||
uv pip install --system -r requirements/cuda-nightly.txt \ | ||
--extra-index-url https://download.pytorch.org/whl/nightly/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.') | ||
|
||
# cuda arch list used by torch | ||
# can be useful for both `dev` and `test` | ||
# explicitly set the list to avoid issues with torch 2.2 | ||
# see https://github.com/pytorch/pytorch/pull/123243 | ||
ARG torch_cuda_arch_list='7.0 7.5 8.0 8.9 9.0 10.0+PTX' | ||
ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list} | ||
# Override the arch list for flash-attn to reduce the binary size | ||
ARG vllm_fa_cmake_gpu_arches='80-real;90-real' | ||
ENV VLLM_FA_CMAKE_GPU_ARCHES=${vllm_fa_cmake_gpu_arches} | ||
#################### BASE BUILD IMAGE #################### | ||
|
||
#################### WHEEL BUILD IMAGE #################### | ||
FROM base AS build | ||
ARG TARGETPLATFORM | ||
|
||
# install build dependencies | ||
COPY requirements/build-nightly.txt requirements/build-nightly.txt | ||
|
||
# This timeout (in seconds) is necessary when installing some dependencies via uv since it's likely to time out | ||
# Reference: https://github.com/astral-sh/uv/pull/1694 | ||
ENV UV_HTTP_TIMEOUT=500 | ||
ENV UV_INDEX_STRATEGY="unsafe-best-match" | ||
|
||
RUN --mount=type=cache,target=/root/.cache/uv \ | ||
uv pip install --system -r requirements/build-nightly.txt \ | ||
--extra-index-url https://download.pytorch.org/whl/nightly/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.') | ||
|
||
COPY . . | ||
ARG GIT_REPO_CHECK=0 | ||
RUN --mount=type=bind,source=.git,target=.git \ | ||
if [ "$GIT_REPO_CHECK" != "0" ]; then bash tools/check_repo.sh ; fi | ||
|
||
# max jobs used by Ninja to build extensions | ||
ARG max_jobs=2 | ||
ENV MAX_JOBS=${max_jobs} | ||
# number of threads used by nvcc | ||
ARG nvcc_threads=8 | ||
ENV NVCC_THREADS=$nvcc_threads | ||
|
||
ARG USE_SCCACHE | ||
ARG SCCACHE_BUCKET_NAME=vllm-build-sccache | ||
ARG SCCACHE_REGION_NAME=us-west-2 | ||
ARG SCCACHE_S3_NO_CREDENTIALS=0 | ||
# if USE_SCCACHE is set, use sccache to speed up compilation | ||
RUN --mount=type=cache,target=/root/.cache/uv \ | ||
--mount=type=bind,source=.git,target=.git \ | ||
if [ "$USE_SCCACHE" = "1" ]; then \ | ||
echo "Installing sccache..." \ | ||
&& curl -L -o sccache.tar.gz https://github.com/mozilla/sccache/releases/download/v0.8.1/sccache-v0.8.1-x86_64-unknown-linux-musl.tar.gz \ | ||
&& tar -xzf sccache.tar.gz \ | ||
&& sudo mv sccache-v0.8.1-x86_64-unknown-linux-musl/sccache /usr/bin/sccache \ | ||
&& rm -rf sccache.tar.gz sccache-v0.8.1-x86_64-unknown-linux-musl \ | ||
&& export SCCACHE_BUCKET=${SCCACHE_BUCKET_NAME} \ | ||
&& export SCCACHE_REGION=${SCCACHE_REGION_NAME} \ | ||
&& export SCCACHE_S3_NO_CREDENTIALS=${SCCACHE_S3_NO_CREDENTIALS} \ | ||
&& export SCCACHE_IDLE_TIMEOUT=0 \ | ||
&& export CMAKE_BUILD_TYPE=Release \ | ||
&& sccache --show-stats \ | ||
&& python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38 \ | ||
&& sccache --show-stats; \ | ||
fi | ||
|
||
ENV CCACHE_DIR=/root/.cache/ccache | ||
RUN --mount=type=cache,target=/root/.cache/ccache \ | ||
--mount=type=cache,target=/root/.cache/uv \ | ||
--mount=type=bind,source=.git,target=.git \ | ||
if [ "$USE_SCCACHE" != "1" ]; then \ | ||
# Clean any existing CMake artifacts | ||
rm -rf .deps && \ | ||
mkdir -p .deps && \ | ||
python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38; \ | ||
fi | ||
|
||
#################### EXTENSION Build IMAGE #################### |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Should be mirrored in pyproject.toml | ||
cmake>=3.26 | ||
ninja | ||
packaging>=24.2 | ||
setuptools>=77.0.3,<80.0.0 | ||
setuptools-scm>=8 | ||
torch==2.8.0.dev20250530+cu128 | ||
wheel | ||
jinja2>=3.1.6 | ||
regex |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Common dependencies | ||
-r common.txt | ||
|
||
numba == 0.60.0; python_version == '3.9' # v0.61 doesn't support Python 3.9. Required for N-gram speculative decoding | ||
numba == 0.61.2; python_version > '3.9' | ||
|
||
# Dependencies for NVIDIA GPUs | ||
ray[cgraph]>=2.43.0, !=2.44.* # Ray Compiled Graph, required for pipeline parallelism in V1. | ||
torch==2.8.0.dev20250530+cu128 | ||
torchaudio==2.6.0.dev20250530+cu128 | ||
# These must be updated alongside torch | ||
torchvision==0.22.0.dev20250530+cu128 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we upload-wheels.sh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @houseroad not yet. Lets first validate that build works, once we are happy with the build we should find good place to upload it.