Skip to content

Detokenize option in /v1/completions request #5382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Wokzy
Copy link

@Wokzy Wokzy commented Jun 20, 2025

Detokenize option in sampling params for trtllm-serve

Description

Now, when making request, users can specify boolean option "detokenize", for instance:
curl http://localhost:54321/v1/completions -H "Content-Type: application/json" -d '{"model":"DeepSeek-V3-0324", "prompt":"hello", "max_tokens":64, "detokenize":false}'

and output will be smth like:

{"id":"cmpl-2027c15c7d544095908a27be101c61a2","object":"text_completion","created":1750405539,"model":"DeepSeek-V3-0324","choices":[{"index":0,"text":"[14, 342, 4571, 5958, 304, 5085, 270, 2502, 78157, 295, 65980, 14, 342, 1153, 396, 65980, 769, 270, 5304, 304, 5085, 295, 223, 20, 70, 14, 342, 2090, 1664, 1153, 1192, 304, 2859, 436, 377, 14, 305, 342, 4571, 3638, 260, 3375, 1014, 4985, 436, 16, 588, 5524, 1694, 678, 14, 8033, 1240, 23166, 2893, 43, 611, 260, 3295, 418, 270, 18395]","logprobs":null,"context_logits":null,"finish_reason":"length","stop_reason":null,"disaggregated_params":null}],"usage":{"prompt_tokens":2,"total_tokens":64,"completion_tokens":62},"prompt_token_ids":[[0,33310]]}

This option works fine with stream: true

@juney-nvidia juney-nvidia added Community want to contribute PRs initiated from Community Community Engagement help/insights needed from community labels Jun 20, 2025
@LinPoly
Copy link
Collaborator

LinPoly commented Jun 26, 2025

Thanks for contributing! For the OAI API level, I think the change is okay given that we have many sampling args in the request protocol, for the change in executor I think @Superjomn and @syuoni has better knowledge than I. I will also check if we need to change the executor level things, enhancing trtllm-serve maybe just enough.

Comment on lines +395 to +399
if self._streaming:
beam_output._last_text_len = 0
beam_output.text = str(beam_output.token_ids_diff)
else:
beam_output.text = str(beam_output.token_ids)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confused, why do we need to fill the text when detokenize is False or tokenizer is unavaible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Engagement help/insights needed from community Community want to contribute PRs initiated from Community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants