Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add docs for pass in token ids directly #2661

Open
2 tasks done
zhaochenyang20 opened this issue Dec 30, 2024 · 10 comments
Open
2 tasks done

[Feature] Add docs for pass in token ids directly #2661

zhaochenyang20 opened this issue Dec 30, 2024 · 10 comments
Assignees
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers RLHF Using SGLang for post training

Comments

@zhaochenyang20
Copy link
Collaborator

Checklist

Motivation

In most of RLHF frameworks, the prompts are pre-tokenized when data processing, so they can directly pass in token ids to the sglang engine rather than the prompts. So we should add docs on how to do this and how to get tokens directly.

Related resources

No such.

@zhaochenyang20 zhaochenyang20 added documentation Improvements or additions to documentation good first issue Good for newcomers labels Dec 30, 2024
@shuaills
Copy link
Contributor

The --skip-tokenizer-init flag allows direct input of token IDs instead of text prompts.

if self.server_args.skip_tokenizer_init: json_data = { "input_ids": prompt, "sampling_params": sampling_params, "stream": True, }

parser.add_argument( "--skip-tokenizer-init", action="store_true", help="If set, skip init tokenizer and pass input_ids in generate request", )

@shuaills
Copy link
Contributor

Also clarify the naming when skip_init_tokenizer is on, prompt -> prompt token

@zhaochenyang20 zhaochenyang20 self-assigned this Dec 30, 2024
@zhaochenyang20 zhaochenyang20 added the RLHF Using SGLang for post training label Dec 30, 2024
@zhaochenyang20
Copy link
Collaborator Author

@shuaills Also, check whether these two parameters are conflicted.

Add an example on:

  1. launch a server with clear arguments.
  2. pass in token ids instead of prompts.
  3. get output tokens and input tokens from model outputs.
  4. assert output tokens from engine are the same as passed in parameters.

I think we should change this:

https://github.com/sgl-project/sglang/blob/main/test/srt/test_engine_token_ids.py

And add examples in:

https://github.com/sgl-project/sglang/tree/main/examples/runtime

And change docs accordingly.

@zhaochenyang20
Copy link
Collaborator Author

Also, please take special consideration of special tokens and chat template. I assume that:

give a string prompt = "xxxxxxxxxxxxxxx"

  1. launch an engine and pass in prompt, get the input tokens A1 and output tokens B1.
  2. launch an engine that accepts pass-in token ids. First tokenize prompt with a hugging face tokenizer to get input tokens A2. Pass in A2 to the engine and get the input tokens A3 and output tokens B2 from the engine.

We should have:

A1 == A2 == A3

Also, the sampling parameter may introduce some randomness for B1 == B2, like this:

https://sgl-project.github.io/references/faq.html#the-results-are-not-deterministic-even-with-a-temperature-of-0

Maybe you can give link to this in the unit tests.

Also, do not remove any of the current test cases in [test_engine_token_ids.py](https://github.com/sgl-project/sglang/blob/main/test/srt/test_engine_token_ids.py). Only adds new tests to it.

@shuaills
Copy link
Contributor

Thanks for the clarification. Sounds good.

@zhaochenyang20
Copy link
Collaborator Author

  1. When turning on skip, only tokens are involved.
  • If return is turned on, both input tokens and output tokens will be returned. => This is used in unit tests to ensure that the passed-in tokens match the returned input tokens.
  • Without return, only output tokens are returned. => It should be asserted that there are no input tokens in the return value. Although this is not a problem, it may introduce a small overhead.
  1. When turning off skip, only strings are allowed.
  • If return is turned on, both input tokens and output tokens will be returned. => It is necessary to ensure that tokenizer.tokenize(prompt) == output["input_token_ids"] and that the output tokens are in the return value.
  • If return is turned off, only strings are input and output. => It should be ensured that there are no tokens in the return value but there are token counts.

@zhaochenyang20
Copy link
Collaborator Author

Be careful about the chat template and special tokens.

@zhaochenyang20
Copy link
Collaborator Author

zhaochenyang20 commented Dec 30, 2024

What's the usage of special tokens and chat templates.

@zhaochenyang20
Copy link
Collaborator Author

@shuaills And keep care of multi-modal models.

@zhaochenyang20
Copy link
Collaborator Author

zhaochenyang20 commented Jan 8, 2025

@shuaills Also, here is something. In vllm, there is a strange parameter called Include_stop_str, which is used to add the eos token in the output, and it's pretty important in RLHF. We should also check whether sglang output has this eos token, though we may not need the Include_stop_str parameter. And, we should be careful whether sglang's output tokens include special tokens, and include what?


I am using this right now, too tedious:

                input_token_id_list = [list(output["input_ids"]) for output in outputs]
                output_token_id_list = [
                    list(output["output_ids"]) + [eos_token_id] if list(output["output_ids"])[-1] != eos_token_id 
                    else list(output["output_ids"]) 
                    for output in outputs
                ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers RLHF Using SGLang for post training
Projects
None yet
Development

No branches or pull requests

2 participants