-
Notifications
You must be signed in to change notification settings - Fork 91
[Feature] Provide batched fill next token mask and batched accept string/token methods. #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Provide batched fill next token mask and batched accept string/token methods. #445
Conversation
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x grammar, latency
Before (sequential); After(batch)
Use one example in JSB
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Is y axis the average time or total time? And why is batch=1 so slow? Didn't you warm it up? And can you also show speedup with batch is large, like 300
I did warmed it up. I'll test it more. |
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
Signed-off-by: Yuchuan <[email protected]>
The testing code is here:
|
Signed-off-by: Yuchuan <[email protected]> refactor nanobind. Signed-off-by: Yuchuan <[email protected]> refactor python files. Signed-off-by: Yuchuan <[email protected]> finish. Signed-off-by: Yuchuan <[email protected]> fix batch filling. Signed-off-by: Yuchuan <[email protected]>
2a63ee5
to
6374909
Compare
This PR provides a series of methods to batched fill next token mask and batched accept string/token for
GrammarMatcher
. In general, there are three new methods:batch_fill_next_token_bitmask(matchers: List["GrammarMatcher"], bitmask: ArrayLike, index: int = 0, max_threads: int = 16, debug_print: bool = False)
batch_accept_string(matchers: List["GrammarMatcher"], strings: List[str], debug_print: bool = False)
batch_accept_token(matchers: List["GrammarMatcher"], tokens: List[int], debug_print: bool = False)
These methods allow users to fill multiple
GrammarMatcher
s' token masks in a batch, and reduce the overhead of the transversion between cpp and python.