Skip to content

llava batch infer, only the result corresponding to the longest prompt is correct, while other results are incorrectΒ #1881

Open
@lss15151161

Description

@lss15151161

version: TensorRT-LLM 0.10.0
the official script(TensorRT-LLM/examples/multimodal/run.py) use same prompt repeat to form a batch. but if I use different prompts to form a batch, the result is incorrect. how to solve it?
because the result corresponding to the longest prompt is correct, I think the reason is padding.
image

if i use the same prompts, the result is correct
image

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions