Skip to content

Are multimodal models supported by trtllm-serve? #2714

Open
@xiaoyuzju

Description

@xiaoyuzju

I successfully ran the inference of the Qwen2-VL-7B model following the guidelines in examples/multimodal/README.md. However, when I tried to deploy it using trtllm-serve and tested the inference, the server threw an error.

[TensorRT-LLM][ERROR] Encountered an error in forwardAsync function: Input tensor 'mrope_rotary_sin_cos' not found; expected shape: (-1, 4194304) (/home/jenkins/agent/workspace/LLM/release-0.16/L0_Test-x86_64/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:484)
1       0x7f00e5616e9a tensorrt_llm::runtime::TllmRuntime::setInputTensorsImpl(int, std::unordered_map<std::string, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&, bool) + 1370
2       0x7f00e56178d5 tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::string, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 53
3       0x7f00e59b2dea tensorrt_llm::batch_manager::TrtGptModelInflightBatching::prepareBuffers(std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, int) + 218
4       0x7f00e59b333f tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeStep(std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, int) + 1055
5       0x7f00e59b3b7e tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeBatch(tensorrt_llm::batch_manager::ScheduledRequests const&) + 222
6       0x7f00e59b4268 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&) + 1672
7       0x7f00e5a49746 tensorrt_llm::executor::Executor::Impl::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 486
8       0x7f00e5a50331 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1281
9       0x7f0204a635c0 /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch.so(+0x145c0) [0x7f0204a635c0]
10      0x7f0242dffa94 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9ca94) [0x7f0242dffa94]
11      0x7f0242e8cc3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f0242e8cc3c]

I also attempted deploying Llama-3-8B and Qwen2-7B with trtllm-serve , both of which worked fine for inference. After a brief review of the trtllm-serve code, I didn't seem to find support for multimodal models. Could you please confirm if trtllm-serve currently supports deploying multimodal models?

Thank you very much for your assistance.

Metadata

Metadata

Assignees

Labels

OpenAI APItrtllm-serve's OpenAI-compatible API: endpoint behavior, req/resp formats, feature parity.triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions