How is the performance of the model with pytorch as the backend

Which one has better performance, using pythorch as the backend or tensorrt-llm as the backend? During the actual test with qwen3, I found that the performance of using pythorch as the backend was not good, and the performance of a single gpu and multiple gpus was the same. Is this normal? Did I miss any details of the inference configuration?