Skip to content

Adding EC2 tests on vLLM DLC #4986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 122 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
44d464f
testing vllm
Jyothirmaikottu Jul 1, 2025
cb8fdb8
add ec2
Jyothirmaikottu Jul 7, 2025
3d8a430
Merge remote-tracking branch 'upstream/master' into vllm-ec2
Jyothirmaikottu Jul 7, 2025
1d09e4c
testing vllm route
Jyothirmaikottu Jul 9, 2025
2675fc8
fixed error in trigger_test
Jyothirmaikottu Jul 9, 2025
df0751e
added new dir for vllm test and infra
Jyothirmaikottu Jul 9, 2025
3eb563c
commented out test runner
Jyothirmaikottu Jul 9, 2025
1fb0aa1
trigger ec2
Jyothirmaikottu Jul 9, 2025
61d9100
create ec2
Jyothirmaikottu Jul 9, 2025
564b299
change region
Jyothirmaikottu Jul 10, 2025
745dadc
adding fsx
Jyothirmaikottu Jul 10, 2025
43e8b1c
adding fsx
Jyothirmaikottu Jul 10, 2025
28b6175
create func for subnet id
Jyothirmaikottu Jul 11, 2025
7e2b382
print statements
Jyothirmaikottu Jul 11, 2025
5540b04
print statements
Jyothirmaikottu Jul 13, 2025
eccd3c2
add more delete functionalities
Jyothirmaikottu Jul 13, 2025
b1a460d
fix ingress rules
Jyothirmaikottu Jul 13, 2025
49ec039
make dg is list
Jyothirmaikottu Jul 13, 2025
164e128
modify egress and ingress
Jyothirmaikottu Jul 13, 2025
4e82437
add ingress and egress rules
Jyothirmaikottu Jul 13, 2025
e43313b
add setup_script
Jyothirmaikottu Jul 13, 2025
b52607c
add setup_script
Jyothirmaikottu Jul 13, 2025
45cbdc5
fixed path
Jyothirmaikottu Jul 13, 2025
26a8611
fix sg re ordering
Jyothirmaikottu Jul 13, 2025
d4c2c9b
fix sg and fsx
Jyothirmaikottu Jul 14, 2025
f0adbc0
fix sg and fsx
Jyothirmaikottu Jul 14, 2025
0708a4d
fix sg and fsx
Jyothirmaikottu Jul 14, 2025
c581420
fixed hf token error
Jyothirmaikottu Jul 14, 2025
1ac8f01
fix error with sg and fsx mount
Jyothirmaikottu Jul 14, 2025
b2ff468
commented out cleanup code
Jyothirmaikottu Jul 14, 2025
7895633
refactor setup() and fsx_utils sg
Jyothirmaikottu Jul 14, 2025
34de29e
fix sg
Jyothirmaikottu Jul 14, 2025
75bb994
modify sg creation
Jyothirmaikottu Jul 14, 2025
4fb6a69
add self
Jyothirmaikottu Jul 14, 2025
8fe55ce
setup instances failure
Jyothirmaikottu Jul 14, 2025
d83e5a8
fix deletion of sg
Jyothirmaikottu Jul 14, 2025
bc0a3dd
remove version
Jyothirmaikottu Jul 14, 2025
2c9ac89
remove command
Jyothirmaikottu Jul 14, 2025
02c776c
adding test-runner path and actual single node test
Jyothirmaikottu Jul 14, 2025
0c0347b
added secret key for hf
Jyothirmaikottu Jul 14, 2025
b95b637
fixed import
Jyothirmaikottu Jul 14, 2025
d66c172
rename fn'
Jyothirmaikottu Jul 14, 2025
b52810d
testspec use trigger_test:
Jyothirmaikottu Jul 14, 2025
81117a1
fix import
Jyothirmaikottu Jul 15, 2025
edf60a3
fix errors
Jyothirmaikottu Jul 15, 2025
d5f1a2e
add cleanup logic
Jyothirmaikottu Jul 15, 2025
04256b6
increase time out
Jyothirmaikottu Jul 15, 2025
293ed99
change region
Jyothirmaikottu Jul 15, 2025
89cf458
changed it back to us-west-2
Jyothirmaikottu Jul 15, 2025
61f5250
modified test for single node
Jyothirmaikottu Jul 15, 2025
cb0af6d
modified test for single node
Jyothirmaikottu Jul 15, 2025
ac3e801
changes to test to use script
Jyothirmaikottu Jul 16, 2025
05eac52
Merge remote-tracking branch 'upstream/master' into vllm-ec2
Jyothirmaikottu Jul 16, 2025
11f3145
retrigger tst
Jyothirmaikottu Jul 16, 2025
7eb1204
remove nvjpeg
Jyothirmaikottu Jul 16, 2025
de2b4ad
remove nvjpeg
Jyothirmaikottu Jul 16, 2025
cdab16d
add logs
Jyothirmaikottu Jul 16, 2025
2b6f79b
fix script to pass arguments
Jyothirmaikottu Jul 17, 2025
68a1691
fix script to pass arguments
Jyothirmaikottu Jul 17, 2025
c41ed8a
fix string
Jyothirmaikottu Jul 17, 2025
e6a1e63
fix string
Jyothirmaikottu Jul 17, 2025
aef0687
test ec2
Jyothirmaikottu Jul 17, 2025
f742524
remove unused code
Jyothirmaikottu Jul 21, 2025
13c28dd
add multinode
Jyothirmaikottu Jul 21, 2025
57ec879
fixed connection
Jyothirmaikottu Jul 21, 2025
8427db0
fix connection
Jyothirmaikottu Jul 21, 2025
cc01855
fix path
Jyothirmaikottu Jul 21, 2025
438b725
fix comand
Jyothirmaikottu Jul 21, 2025
1a38f4d
retrigger ec2
Jyothirmaikottu Jul 22, 2025
a8c2937
Merge branch 'master' into vllm-ec2
Jyothirmaikottu Jul 22, 2025
58ef98f
Merge branch 'master' into vllm-ec2
Jyothirmaikottu Jul 23, 2025
3d775b8
increase wait time and add fsx version
Jyothirmaikottu Jul 23, 2025
ed94ecc
fix fsx command
Jyothirmaikottu Jul 23, 2025
8c2ca97
fix names
Jyothirmaikottu Jul 23, 2025
471b9ff
fix dir
Jyothirmaikottu Jul 23, 2025
17b1ca5
fix vllm dir and add log
Jyothirmaikottu Jul 23, 2025
bda6b37
fix git clone
Jyothirmaikottu Jul 23, 2025
f1170f9
fix git url
Jyothirmaikottu Jul 23, 2025
a4eef72
fix path
Jyothirmaikottu Jul 23, 2025
97077fa
increase max attempts
Jyothirmaikottu Jul 24, 2025
06cf30b
fixed paths
Jyothirmaikottu Jul 24, 2025
d6e33a5
added more fixes
Jyothirmaikottu Jul 24, 2025
f06c4fe
sleep
Jyothirmaikottu Jul 24, 2025
1fc7fdb
setup instance one at a time
Jyothirmaikottu Jul 24, 2025
d06cd9a
create diff fsx and sg for another instance
Jyothirmaikottu Jul 24, 2025
48abe34
add conda installer
Jyothirmaikottu Jul 24, 2025
c08c822
create conda env
Jyothirmaikottu Jul 25, 2025
636cb42
conda accept tps
Jyothirmaikottu Jul 25, 2025
92a8076
fix sg and multinode
Jyothirmaikottu Jul 25, 2025
ee9f82c
add packages
Jyothirmaikottu Jul 27, 2025
3ebed39
add venv vllm_env
Jyothirmaikottu Jul 27, 2025
0452b3c
add venv vllm_env
Jyothirmaikottu Jul 27, 2025
1c08d41
fixed vllm venv
Jyothirmaikottu Jul 27, 2025
fd8e127
fixed transformrs isntallation
Jyothirmaikottu Jul 28, 2025
dd125b1
fix cleanup logic
Jyothirmaikottu Jul 28, 2025
b160c6c
add timer
Jyothirmaikottu Jul 28, 2025
d125d23
increase timer
Jyothirmaikottu Jul 28, 2025
db862de
run single node
Jyothirmaikottu Jul 28, 2025
2cd2b0a
multinode test
Jyothirmaikottu Jul 28, 2025
5aafae8
add packages
Jyothirmaikottu Jul 29, 2025
a0f0ac4
test ec2
Jyothirmaikottu Jul 29, 2025
1e34f95
activate venv
Jyothirmaikottu Jul 29, 2025
c6fba55
add vllm serve
Jyothirmaikottu Jul 29, 2025
7b6afc7
increase cleanup timer
Jyothirmaikottu Jul 29, 2025
f358ff2
test multinode
Jyothirmaikottu Jul 30, 2025
e750c2c
test mutlinode
Jyothirmaikottu Jul 30, 2025
1f6ed9e
test multinode
Jyothirmaikottu Jul 30, 2025
cdd5e26
test multinode
Jyothirmaikottu Jul 30, 2025
a7ff3f0
test multinode
Jyothirmaikottu Jul 31, 2025
0c1a8e6
retest
Jyothirmaikottu Jul 31, 2025
90c1589
retest
Jyothirmaikottu Jul 31, 2025
66eed3f
retest
Jyothirmaikottu Jul 31, 2025
3e26dc6
retest
Jyothirmaikottu Jul 31, 2025
8c8ee6f
retest
Jyothirmaikottu Jul 31, 2025
87156bf
retest single node
Jyothirmaikottu Jul 31, 2025
0941b76
test
Jyothirmaikottu Jul 31, 2025
cf69a6f
test single node
Jyothirmaikottu Jul 31, 2025
f114b92
test efa and nccl
Jyothirmaikottu Jul 31, 2025
db4fb9c
test efa and nccl multinode
Jyothirmaikottu Jul 31, 2025
f1e0c80
test efa and nccl
Jyothirmaikottu Aug 1, 2025
2df0158
add sleep timer
Jyothirmaikottu Aug 1, 2025
ff15bab
test efa
Jyothirmaikottu Aug 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions dlc_developer_config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,16 +37,16 @@ deep_canary_mode = false
[build]
# Add in frameworks you would like to build. By default, builds are disabled unless you specify building an image.
# available frameworks - ["base", "vllm", "autogluon", "huggingface_tensorflow", "huggingface_pytorch", "huggingface_tensorflow_trcomp", "huggingface_pytorch_trcomp", "pytorch_trcomp", "tensorflow", "pytorch", "stabilityai_pytorch"]
build_frameworks = []
build_frameworks = ["vllm"]


# By default we build both training and inference containers. Set true/false values to determine which to build.
build_training = true
build_inference = true
build_training = false
build_inference = false

# Set do_build to "false" to skip builds and test the latest image built by this PR
# Note: at least one build is required to set do_build to "false"
do_build = true
do_build = false

[notify]
### Notify on test failures
Expand Down
21 changes: 21 additions & 0 deletions test/test_utils/ec2.py
Original file line number Diff line number Diff line change
Expand Up @@ -1817,6 +1817,27 @@ def get_default_subnet_for_az(ec2_client, availability_zone):
return az_subnet_id


def get_subnet_id_by_vpc(ec2_client, vpc_id):

response = ec2_client.describe_subnets(
Filters=[
{
"Name": "vpc-id",
"Values": [
vpc_id,
],
},
],
)

subnet_ids = []
for subnet in response["Subnets"]:
if subnet["SubnetId"] is not None:
subnet_ids.append(subnet["SubnetId"])

return subnet_ids


def get_vpc_id_by_name(ec2_client, vpc_name):
"""
Get VPC ID by VPC name tag
Expand Down
21 changes: 12 additions & 9 deletions test/testrunner.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
)
from test_utils import KEYS_TO_DESTROY_FILE
from test_utils.pytest_cache import PytestCache

from vllm.trigger_test import test
from src.codebuild_environment import get_codebuild_project_name

LOGGER = logging.getLogger(__name__)
Expand Down Expand Up @@ -308,15 +308,18 @@ def main():
build_context = get_build_context()

# Skip non-sanity/security test suites for base or vllm images in MAINLINE context
if (
build_context == "MAINLINE"
and all("base" in image_uri or "vllm" in image_uri for image_uri in all_image_list)
and test_type not in {"functionality_sanity", "security_sanity"}
if build_context == "MAINLINE" and all(
"base" in image_uri or "vllm" in image_uri for image_uri in all_image_list
):
LOGGER.info(
f"NOTE: {specific_test_type} tests not supported on base or vllm images. Skipping..."
)
return
if test_type not in {"functionality_sanity", "security_sanity"}:
if test_type in {"ec2", "eks"}:
LOGGER.info("Running VLLM EC2 EKS tests...")
test()
else:
LOGGER.info(
f"NOTE: {specific_test_type} tests not supported on base or vllm images. Skipping..."
)
return

# quick_checks tests don't have images in it. Using a placeholder here for jobs like that
try:
Expand Down
3 changes: 2 additions & 1 deletion testspec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ phases:
- pip install scheduler/.
- echo Running pytest $TEST_TYPE tests on $DLC_IMAGES...
- export PYTHONPATH=$PYTHONPATH:$(pwd)/src
- python test/testrunner.py
# - python test/testrunner.py
- python vllm/trigger_test.py
post_build:
commands:
- python src/send_status.py --status $CODEBUILD_BUILD_SUCCEEDING
Expand Down
1 change: 1 addition & 0 deletions vllm/buildspec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,4 @@ images:
test_platforms:
- sanity
- security
- ec2
Loading