Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions examples/backends/vllm/launch/agg_multimodal_epd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ trap 'echo Cleaning up...; kill 0' EXIT
MODEL_NAME="llava-hf/llava-1.5-7b-hf"
PROMPT_TEMPLATE="USER: <image>\n<prompt> ASSISTANT:"
PROVIDED_PROMPT_TEMPLATE=""
REQUEST_PLANE="nats"

# Parse command line arguments
while [[ $# -gt 0 ]]; do
Expand All @@ -31,11 +32,16 @@ while [[ $# -gt 0 ]]; do
PROVIDED_PROMPT_TEMPLATE=$2
shift 2
;;
--request-plane)
REQUEST_PLANE=$2
shift 2
;;
-h|--help)
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " --model <model_name> Specify the model to use (default: $MODEL_NAME)"
echo " --prompt-template <template> Specify the multi-modal prompt template to use. LLaVA 1.5 7B, Qwen2.5-VL, and Phi3V models have predefined templates."
echo " --request-plane <plane> Specify the request plane to use. (default: nats, options: nats, http, tcp)"
echo " -h, --help Show this help message"
exit 0
;;
Expand Down Expand Up @@ -63,6 +69,12 @@ else
exit 1
fi


if [[ -n "$REQUEST_PLANE" ]]; then
export DYN_REQUEST_PLANE="$REQUEST_PLANE"
echo "Using request plane: $REQUEST_PLANE"
fi

# Start frontend (HTTP endpoint)
python -m dynamo.frontend --http-port=8000 &

Expand Down
30 changes: 30 additions & 0 deletions tests/serve/test_vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,36 @@ class VLLMConfig(EngineConfig):
)
],
),
"multimodal_agg_qwen_epd_30mb_img": VLLMConfig(
name="multimodal_agg_qwen_epd_30mb_img",
directory=vllm_dir,
script_name="agg_multimodal_epd.sh",
marks=[pytest.mark.gpu_2],
model="Qwen/Qwen2.5-VL-7B-Instruct",
delayed_start=0,
script_args=[
"--model",
"Qwen/Qwen2.5-VL-7B-Instruct",
"--request-plane",
"tcp",
],
timeout=360,
request_payloads=[
chat_payload(
[
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://floatingworld.com/wp-content/uploads/2023/02/Sample-jpg-image-30mb-16.jpg"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krishung5 - want to double check - will this be decoded in the frontend and sent over nixl or over the request plane?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In EPD, the image will be loaded in the encode worker: https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/vllm/multimodal_handlers/encode_worker_handler.py#L50
it's not decoded in the frontend. I think this PR is going to add that: #3988

},
},
],
repeat_count=1,
expected_response=["bus"],
)
],
),
Comment on lines +180 to +209
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify the expected response matches the actual image content.

The test expects a response containing "bus", but the 30MB image URL (https://floatingworld.com/wp-content/uploads/2023/02/Sample-jpg-image-30mb-16.jpg) appears to be a generic sample image from floatingworld.com, not the COCO dataset bus image used in the reference test configuration multimodal_agg_qwen_epd (line 171). This mismatch could cause the test to fail.

Please verify the image content and update the expected response accordingly. You can check the image content with:

Additionally, please search the web to confirm whether the floatingworld.com sample image contains a bus:


🏁 Script executed:

#!/bin/bash
# Description: Verify the image URL is accessible and examine its metadata
# Expected: HTTP 200 response and image metadata

echo "Checking image URL accessibility..."
curl -I "https://floatingworld.com/wp-content/uploads/2023/02/Sample-jpg-image-30mb-16.jpg" 2>&1 | head -n 10

echo -e "\n\nChecking reference COCO image URL..."
curl -I "http://images.cocodataset.org/test2017/000000155781.jpg" 2>&1 | head -n 10

Length of output: 1218


🌐 Web query:

What is shown in the image at https://floatingworld.com/wp-content/uploads/2023/02/Sample-jpg-image-30mb-16.jpg?

💡 Result:

The image is a colorful nebula/space-style test picture (a bright, cloud‑like “nebula” texture used as a large sample JPG). [1][2]

Sources:
[1] Floating World Gallery — Sample-jpg-image-30mb 16.
[2] Example collection of large sample/nebula test images (CSDN).


Update the expected response to match the actual nebula image content or replace the image with one containing a bus.

The verification confirms the major issue: the image at the provided URL is a colorful nebula/space-style test texture, not a bus image. The test expects ["bus"] as the response, but this will fail when the model analyzes the actual nebula image. Either update expected_response to reflect what's in the nebula image, or replace the image URL with one that actually contains a bus (such as the COCO reference image used in the multimodal_agg_qwen_epd configuration).

🤖 Prompt for AI Agents
In tests/serve/test_vllm.py around lines 180 to 209, the test uses a
nebula/space texture image but asserts expected_response=["bus"]; update the
test so the expected output matches the actual nebula image content OR replace
the image URL with a valid bus image (for example reuse the COCO bus reference
image from multimodal_agg_qwen_epd config), and ensure
repeat_count/expected_response pair is consistent with the chosen image so the
model's response assertion will pass.

"multimodal_agg_qwen": VLLMConfig(
name="multimodal_agg_qwen",
directory=vllm_dir,
Expand Down
Loading