Skip to content
Discussion options

You must be logged in to vote

Hey Alex, the buffer management should be more clear in recent versions. However, I would recommend looking at the PyTorch backend which is the default since v1.0.

Runtime buffers are using for storing values for TRT engine inputs and outputs

Correct.

Decoder buffers, if I get right decoder means not actual decoder, rather than sampler that sample discrete tokens from models logins and doing all beam search logic which is needed

Correct. These have been refactored into DecoderState. There is only one of this.

Decoder inputs and outputs - still does not understand why do we need them, I we already have decoder buffers

The idea here is that inputs and outputs are only valid for a spec…

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by alxmamaev
Comment options

You must be logged in to vote
1 reply
@Funatiq
Comment options

Funatiq Oct 27, 2025
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants