llava batch infer, only the result corresponding to the longest prompt is correct, while other results are incorrect

version: TensorRT-LLM 0.10.0
the official script(TensorRT-LLM/examples/multimodal/run.py) use same prompt repeat to form a batch. but if I use different prompts to form a batch, the result is incorrect. how to solve it? 
because the result corresponding to the longest prompt is correct, I think the reason is padding.
<img width="1200" alt="image" src="https://github.com/NVIDIA/TensorRT-LLM/assets/34617721/5c039d31-a744-47b2-a8e6-4855a6203a49">

if i use the same prompts, the result is correct
<img width="1200" alt="image" src="https://github.com/NVIDIA/TensorRT-LLM/assets/34617721/802de0a9-eab6-4d69-869d-47d079ec50f0">


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llava batch infer, only the result corresponding to the longest prompt is correct, while other results are incorrect #1881

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

llava batch infer, only the result corresponding to the longest prompt is correct, while other results are incorrect #1881

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions