Detokenize option in /v1/completions request #5382

Wokzy · 2025-06-20T07:46:26Z

Detokenize option in sampling params for trtllm-serve

Description

Now, when making request, users can specify boolean option "detokenize", for instance:
curl http://localhost:54321/v1/completions -H "Content-Type: application/json" -d '{"model":"DeepSeek-V3-0324", "prompt":"hello", "max_tokens":64, "detokenize":false}'

and output will be smth like:

{"id":"cmpl-2027c15c7d544095908a27be101c61a2","object":"text_completion","created":1750405539,"model":"DeepSeek-V3-0324","choices":[{"index":0,"text":"[14, 342, 4571, 5958, 304, 5085, 270, 2502, 78157, 295, 65980, 14, 342, 1153, 396, 65980, 769, 270, 5304, 304, 5085, 295, 223, 20, 70, 14, 342, 2090, 1664, 1153, 1192, 304, 2859, 436, 377, 14, 305, 342, 4571, 3638, 260, 3375, 1014, 4985, 436, 16, 588, 5524, 1694, 678, 14, 8033, 1240, 23166, 2893, 43, 611, 260, 3295, 418, 270, 18395]","logprobs":null,"context_logits":null,"finish_reason":"length","stop_reason":null,"disaggregated_params":null}],"usage":{"prompt_tokens":2,"total_tokens":64,"completion_tokens":62},"prompt_token_ids":[[0,33310]]}

This option works fine with stream: true

Signed-off-by: Yegor <[email protected]>

LinPoly · 2025-06-26T07:59:22Z

Thanks for contributing! For the OAI API level, I think the change is okay given that we have many sampling args in the request protocol, for the change in executor I think @Superjomn and @syuoni has better knowledge than I. I will also check if we need to change the executor level things, enhancing trtllm-serve maybe just enough.

syuoni · 2025-06-26T08:49:36Z

tensorrt_llm/executor/result.py

+                if self._streaming:
+                    beam_output._last_text_len = 0
+                    beam_output.text = str(beam_output.token_ids_diff)
+                else:
+                    beam_output.text = str(beam_output.token_ids)


A bit confused, why do we need to fill the text when detokenize is False or tokenizer is unavaible?

Wokzy added 3 commits June 20, 2025 14:35

Update CompletionRequest

6b3a016

Signed-off-by: Yegor <[email protected]>

Update _handle_response

4758061

Signed-off-by: Yegor <[email protected]>

Update result.py

72b8f27

Signed-off-by: Yegor <[email protected]>

juney-nvidia added Community want to contribute PRs initiated from Community Community Engagement help/insights needed from community labels Jun 20, 2025

juney-nvidia requested review from nv-guomingz and LinPoly June 20, 2025 08:38

syuoni reviewed Jun 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detokenize option in /v1/completions request #5382

Detokenize option in /v1/completions request #5382

Uh oh!

Wokzy commented Jun 20, 2025 •

edited

Loading

Uh oh!

LinPoly commented Jun 26, 2025

Uh oh!

syuoni Jun 26, 2025

Uh oh!

Uh oh!

Detokenize option in /v1/completions request #5382

Are you sure you want to change the base?

Detokenize option in /v1/completions request #5382

Uh oh!

Conversation

Wokzy commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Detokenize option in sampling params for trtllm-serve

Description

Uh oh!

LinPoly commented Jun 26, 2025

Uh oh!

syuoni Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Wokzy commented Jun 20, 2025 •

edited

Loading