Skip to content

How to run Qwen3 using triton-server + trtllm #5310

Open
@ezioliao

Description

@ezioliao

For the Qwen3 model, TensorRT-LLM currently only supports the PyTorch backend. So, if I'm using Triton Server with TensorRT-LLM, how do I actually get Qwen3 running? Are there any step-by-step guides or documentation?

​​I've only worked with the combination of Triton Server and the TensorRT-LLM backend before.​​

Metadata

Metadata

Assignees

Labels

InvestigatingTriton BackendRelated to NVIDIA Triton Inference Server backendquestionFurther information is requestedtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions