Are multimodal models supported by trtllm-serve?

I successfully ran the inference of the Qwen2-VL-7B model following the guidelines in examples/multimodal/README.md. However, when I tried to deploy it using `trtllm-serve` and tested the inference, the server threw an error. 

```
[TensorRT-LLM][ERROR] Encountered an error in forwardAsync function: Input tensor 'mrope_rotary_sin_cos' not found; expected shape: (-1, 4194304) (/home/jenkins/agent/workspace/LLM/release-0.16/L0_Test-x86_64/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:484)
1       0x7f00e5616e9a tensorrt_llm::runtime::TllmRuntime::setInputTensorsImpl(int, std::unordered_map<std::string, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&, bool) + 1370
2       0x7f00e56178d5 tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::string, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 53
3       0x7f00e59b2dea tensorrt_llm::batch_manager::TrtGptModelInflightBatching::prepareBuffers(std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, int) + 218
4       0x7f00e59b333f tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeStep(std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, int) + 1055
5       0x7f00e59b3b7e tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeBatch(tensorrt_llm::batch_manager::ScheduledRequests const&) + 222
6       0x7f00e59b4268 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&) + 1672
7       0x7f00e5a49746 tensorrt_llm::executor::Executor::Impl::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 486
8       0x7f00e5a50331 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1281
9       0x7f0204a635c0 /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch.so(+0x145c0) [0x7f0204a635c0]
10      0x7f0242dffa94 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9ca94) [0x7f0242dffa94]
11      0x7f0242e8cc3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f0242e8cc3c]
```

I also attempted deploying Llama-3-8B and Qwen2-7B with `trtllm-serve` , both of which worked fine for inference. After a brief review of the `trtllm-serve` code, I didn't seem to find support for multimodal models. Could you please confirm if `trtllm-serve` currently supports deploying multimodal models?

Thank you very much for your assistance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Are multimodal models supported by trtllm-serve? #2714

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Are multimodal models supported by trtllm-serve? #2714

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions