What each type of buffer is doing? #7695

alxmamaev · 2025-09-14T01:59:31Z

alxmamaev
Sep 14, 2025

I was trying to modify TRT LLM code to make a specific thing for my model, that’s is not implemented, and was quite confused with amount of buffers that’s existed can you please explain the structure of them?

Runtime buffers are using for storing values for TRT engine inputs and outputs
Decoder buffers, if I get right decoder means not actual decoder, rather than sampler that sample discrete tokens from models logins and doing all beam search logic which is needed
Decoder inputs and outputs - still does not understand why do we need them, I we already have decoder buffers
Encoder buffers - why do we need them as well if encoder is running in separate model instance which has their own runtime buffers
req slots - what is that?
why executor instance have multiple buffers which splits into some “mini batches”?

Answered by Funatiq

Sep 30, 2025

Hey Alex, the buffer management should be more clear in recent versions. However, I would recommend looking at the PyTorch backend which is the default since v1.0.

Runtime buffers are using for storing values for TRT engine inputs and outputs

Correct.

Decoder buffers, if I get right decoder means not actual decoder, rather than sampler that sample discrete tokens from models logins and doing all beam search logic which is needed

Correct. These have been refactored into DecoderState. There is only one of this.

Decoder inputs and outputs - still does not understand why do we need them, I we already have decoder buffers

The idea here is that inputs and outputs are only valid for a spec…

View full answer

Funatiq · 2025-09-30T12:13:18Z

Funatiq
Sep 30, 2025
Collaborator

Hey Alex, the buffer management should be more clear in recent versions. However, I would recommend looking at the PyTorch backend which is the default since v1.0.

Runtime buffers are using for storing values for TRT engine inputs and outputs

Correct.

Decoder buffers, if I get right decoder means not actual decoder, rather than sampler that sample discrete tokens from models logins and doing all beam search logic which is needed

Correct. These have been refactored into DecoderState. There is only one of this.

Decoder inputs and outputs - still does not understand why do we need them, I we already have decoder buffers

The idea here is that inputs and outputs are only valid for a specific iteration. There can be multiple batches each with their own inputs and outputs.

Encoder buffers - why do we need them as well if encoder is running in separate model instance which has their own runtime buffers

Not sure. I think they are needed to store additional information that is not present in decoder-only models.

req slots - what is that?

req slot / batch slot is an identifier which maps a request to a specific resource slot. The slot is persistent for the whole execution of the request.

why executor instance have multiple buffers which splits into some “mini batches”?

Multiple batches (each with their own buffers) are used in TrtOverlap (overlap scheduling) and pipeline parallelism.

0 replies

alxmamaev · 2025-10-24T23:11:41Z

alxmamaev
Oct 24, 2025
Author

@Funatiq Hi, just noticed your reply. What do you mean in PyTorch backend? I remember there was two options: Python Runtime and C++ runtime, and the reason why I was looking into C++ runtime was Continuous Batching support and ability to use C++ runtime via Triton Server.

1 reply

Funatiq Oct 27, 2025
Collaborator

Hi Alex, you are correct, this was the case some time ago, but we are now fully committed to the PyTorch backend (source files here) which is basically a complete rework and not related to the old Python runtime . Please note that the documentation has also been updated to reflect this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What each type of buffer is doing? #7695

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

What each type of buffer is doing? #7695

Uh oh!

alxmamaev Sep 14, 2025

Replies: 2 comments · 1 reply

Uh oh!

Funatiq Sep 30, 2025 Collaborator

Uh oh!

alxmamaev Oct 24, 2025 Author

Uh oh!

Funatiq Oct 27, 2025 Collaborator

alxmamaev
Sep 14, 2025

Replies: 2 comments 1 reply

Funatiq
Sep 30, 2025
Collaborator

alxmamaev
Oct 24, 2025
Author

Funatiq Oct 27, 2025
Collaborator