This repository contains the infrastructure and bootstrap code for the vLLM continuous integration pipeline using Buildkite.
Current CI Infrastructure Setup:
- AWS Buildkite Elastic CI Stack: Infrastructure code in
terraform/aws
- TPU v5/v6e Nodes on GCP: Infrastructure code in
terraform/gcp_old
- GKE Cluster on GCP (currently not in use): Infrastructure code in
terraform/gcp
Buildkite bootstrap scripts & pipeline template files are located in the buildkite/
directory.
vLLM leverages Buildkite for CI workflow. Whenever a commit is pushed to the vLLM GitHub repository, a Buildkite webhook triggers an event that initiates a new build in the Buildkite pipeline with relevant details like Github branch and commit.
Build Process Overview:
-
Bootstrap Step:
- Executed via
buildkite/bootstrap.sh
. - Utilizes a CI Jinja2 template (
buildkite/test-template-ci.j2
) along with the list of tests from vLLM to render a Buildkite YAML configuration that defines all build/test steps and their configurations. - Uploads the rendered YAML to Buildkite to initiate the build.
- Note: We are transitioning to a custom Buildkite pipeline generator to replace the Jinja2 template rendering soon.
- Executed via
-
Job Queueing and Execution:
- Each Buildkite step is associated with an agent queue.
- After uploaded, steps are pushed into the queue, waiting to be picked up by a Buildkite agent.
We use the Buildkite Elastic CI Stack to set up our autoscaling Buildkite agent cluster on AWS.
Components of the stack for each Agent Queue:
-
AWS CloudFormation Stack:
- Contains an EC2 Auto Scaling Group and an AWS Lambda function.
-
EC2 Auto Scaling Group:
- Automatically scales number of EC2 instances based on the workload from the Buildkite queue.
- Each EC2 instance comes with a Buildkite agent that executes jobs.
-
AWS Lambda Function:
- Periodically polls Buildkite to assess capacity needs for the queue and adjusts the size of the Auto Scaling Group accordingly.
- Create a feature branch on this repo, say named
my-feature-branch
. If you can't create a feature branch, ping @khluu to add you into the repo. - Once the branch is created, you can start making changes and commit to the branch.
- After the changes are pushed to the branch, wait a few minutes, then create a new build on Buildkite with this environment variable
VLLM_CI_BRANCH=my-feature-branch
to test your changes against vLLM codebase.
Please note that when creating a new build on Buildkite:
- Please do it on your own feature branch/fork branch on vLLM, preferrably a branch that is up to date with
main
. - If it's a fork branch,
HEAD
cannot be used as commit when creating a build. You have to put in the hash of the latest commit on your branch.
The machines communicate with Buildkite server via an agent being installed on the machine. There are multiple ways agents can be installed, depending on how the machines are set up:
- Buildkite Elastic CI stack if you want the compute to be autoscaling EC2 instances.
- Buildkite K8s agent stack if you want machines to be managed/orchestrated in a Kubernetes cluster.
- Buildkite agent if you already have existing standalone machines.
For all of these approaches, you would need the following info to set up (please contact @khluu on #sig-ci channel - vllm-dev.slack.com to get them):
- Buildkite agent token
- Buildkite queue name
- (optional) Buildkite cluster UUID
If you go with option 1 or 2, these info would need to be provided when you setup the stack. For option 3, it doesn't require it when installing agent. After installation, you would need to manually:
- Add these info in the agent config (usually located in
/etc/buildkite-agent/buildkite-agent.cfg
) - Restart the agent (usually
systemctl stop buildkite-agent
thensystemctl start buildkite-agent
works)