Qwen2.5-VL-3B pytorch backend with cuda graph error result

Testing the Qwen2.5 VL-3B model using TRTLLM version 0.19.0, following the PyTorch workflow example 
[https://github.com/NVIDIA/TensorRT-LLM/blob/release/0.19/examples/pytorch/quickstart_multimodal.py](url)
, running with the use_cuda_graph parameter resulted in only a few generated tokens. Removing the use_cuda_graph parameter produced normal output with over 100 tokens, while all other configuration parameters remained the same. Later, running the same test on TRTLLM 0.20.0 yielded the same results.

`python3 quickstart_multimodal.py \
    --model_dir /qwen/tmp/hf_models/Qwen2.5-VL-3B-Instruct \
    --modality image \
    --max_batch_size 1 \
    --max_num_tokens 4096 \
    --attention_backend TRTLLM \
    --prompt "Please describe this image in at least 100 characters, covering main elements and their relationships." \
    --media "/qwen/pics/demo.jpeg" \
    --max_tokens 128 \
    --use_cuda_graph \
    2>&1 | tee run_qwen2.5_vl_3B_cuda_graph.log`
 

> [0] Prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Please describe this image in at least 100 characters, covering main elements and their relationships.<|im_end|>\n<|im_start|>assistant\n', Generated text: 'A'


`python3 quickstart_multimodal.py \
    --model_dir /qwen/tmp/hf_models/Qwen2.5-VL-3B-Instruct \
    --modality image \
    --max_batch_size 1 \
    --max_num_tokens 4096 \
    --attention_backend TRTLLM \
    --prompt "Please describe this image in at least 100 characters, covering main elements and their relationships." \
    --media "/qwen/pics/demo.jpeg" \
    --max_tokens 128 \
    2>&1 | tee run_qwen2.5_vl_3B_without_cuda_graph.log `

> [0] Prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Please describe this image in at least 100 characters, covering main elements and their relationships.<|im_end|>\n<|im_start|>assistant\n', Generated text: 'A woman and her golden retriever dog are sitting on a sandy beach, with the ocean in the background. The woman is wearing a plaid shirt and black pants, while the dog is wearing a harness. They are both smiling and interacting with each other.'


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen2.5-VL-3B pytorch backend with cuda graph error result #5500

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen2.5-VL-3B pytorch backend with cuda graph error result #5500

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions