Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed Whitespace Error in Streaming mode #423

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

enochlev
Copy link

When a TensorRTLLM is deployed with streaming mode tokens have no white spaces in between streaming chunks.

Link to Issue

This is because calling tokenizer.decode does a whitespace strip of the output and this functionality cannot be disabled. So we must manually look for those white spaces and add.

The standard way of going about this is holding tokens in cache until a space is detected, in which everything after the space is put again into cache.

The other suggested method decodes the token_id text instead of the string text to look for a "_" symbol

This PR should be able to fix these issues

@dbuades
Copy link

dbuades commented Sep 7, 2024

I confirm that this change fixes the issue. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants