Skip to content

How is the performance of the model with pytorch as the backend #4745

Open
@oppolll

Description

@oppolll

Which one has better performance, using pythorch as the backend or tensorrt-llm as the backend? During the actual test with qwen3, I found that the performance of using pythorch as the backend was not good, and the performance of a single gpu and multiple gpus was the same. Is this normal? Did I miss any details of the inference configuration?

Metadata

Metadata

Assignees

Labels

InvestigatingPerformanceTRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions