Replies: 4 comments 4 replies
-
| When setting beam size to 5, we will predict different output tokens for differnet beams. For such a case, in the decode phase, if batch_size=1, the equivalent batch will be changed from 1 to 5. So the latency will increase. | 
Beta Was this translation helpful? Give feedback.
-
| So i need to set batch size to 5 to reduce the increased latency? | 
Beta Was this translation helpful? Give feedback.
-
| TRT-LLM is 3 times faster than CT2 for whisper in almost all cases, you just need to give it enough inputs to see the difference, if you are using it for personal usage and not production, then stick to faster-whiper. | 
Beta Was this translation helpful? Give feedback.
-
| I've noticed that the decoder in TensorRT-LLM doesn't do early stopping. This means that even when it encounters an EOT (End of Text) token, it doesn't stop; it continues to output EOT tokens. For example, the output tokens from my model look like this: 50257 is an EOT token. I think that's why the speed of my tests is not much different between a beam size of 5 and faster-whisper. Any idea? | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
When I set the beam size to 1, TensorRT-LLM is about 50% faster than Faster Whisper. However, when I set the beam size to 5, the speeds are roughly the same. TensorRT-LLM's latency increase is significantly greater than Faster Whisper's. Any thoughts?
Beta Was this translation helpful? Give feedback.
All reactions