[Feature] Provide batched fill next token mask and batched accept string/token methods. #445

Seven-Streams · 2025-09-29T07:58:00Z

This PR provides a series of methods to batched fill next token mask and batched accept string/token for GrammarMatcher. In general, there are three new methods:

batch_fill_next_token_bitmask(matchers: List["GrammarMatcher"], bitmask: ArrayLike, index: int = 0, max_threads: int = 16, debug_print: bool = False)
batch_accept_string(matchers: List["GrammarMatcher"], strings: List[str], debug_print: bool = False)
batch_accept_token(matchers: List["GrammarMatcher"], tokens: List[int], debug_print: bool = False)

These methods allow users to fill multiple GrammarMatchers' token masks in a batch, and reduce the overhead of the transversion between cpp and python.

Signed-off-by: Yuchuan <[email protected]>

Ubospica

x grammar, latency

Before (sequential); After(batch)

Use one example in JSB

python/xgrammar/matcher.py

Signed-off-by: Yuchuan <[email protected]>

Seven-Streams · 2025-09-30T04:11:59Z

Ubospica

LGTM. Is y axis the average time or total time? And why is batch=1 so slow? Didn't you warm it up? And can you also show speedup with batch is large, like 300

docs/xgrammar_features/runtime_safeguards.md

cpp/nanobind/nanobind.cc

cpp/grammar_matcher.cc

include/xgrammar/matcher.h

cpp/nanobind/nanobind.cc

python/xgrammar/matcher.py

Seven-Streams · 2025-10-02T02:19:55Z

LGTM. Is y axis the average time or total time? And why is batch=1 so slow? Didn't you warm it up? And can you also show speedup with batch is large, like 300

I did warmed it up. I'll test it more.

Signed-off-by: Yuchuan <[email protected]>

Seven-Streams · 2025-10-02T03:54:08Z

@Ubospica

The testing code is here:

import xgrammar as xgr
from transformers import AutoTokenizer
import time
import matplotlib.pyplot as plt
import numpy as np

tokenizer_path = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, use_fast=True, trust_remote_code=True)
tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer)
compiler = xgr.GrammarCompiler(tokenizer_info)
regex = "(\\s\\S){24}"
compiled_grammar = compiler.compile_regex(regex)

batch_time_list = dict()
naive_time_list = dict()

batch_sizes = [1, 2, 4, 8, 16, 32, 64, 128, 256]

# Warm up
for batch_size in batch_sizes:
    matchers = [
        xgr.GrammarMatcher(compiled_grammar)
        for _ in range(batch_size)
    ]
    dummy_next_token_bitmask = xgr.allocate_token_bitmask(1, tokenizer.vocab_size)
    for matcher in matchers:
        matcher.fill_next_token_bitmask(dummy_next_token_bitmask)
    
    matchers = [
        xgr.GrammarMatcher(compiled_grammar)
        for _ in range(batch_size)
    ]
    dummy_next_batch_token_bitmask = xgr.allocate_token_bitmask(batch_size, tokenizer.vocab_size)
    xgr.GrammarMatcher.batch_fill_next_token_bitmask(matchers, dummy_next_batch_token_bitmask)

# Actual benchmarking
for _ in range(10):
    for batch_size in batch_sizes:
        matchers = [
            xgr.GrammarMatcher(compiled_grammar)
            for _ in range(batch_size)
        ]
        
        next_token_bitmask = xgr.allocate_token_bitmask(batch_size, tokenizer.vocab_size)
        
        # warm up
        for i, matcher in enumerate(matchers):
            matcher.fill_next_token_bitmask(next_token_bitmask, i)

        start_time = time.time_ns()
        for i, matcher in enumerate(matchers):
            matcher.fill_next_token_bitmask(next_token_bitmask, i)
        end_time = time.time_ns()
        elapsed_time = end_time - start_time
        if batch_size not in naive_time_list:
            naive_time_list[batch_size] = []
        naive_time_list[batch_size].append(elapsed_time)
        
        matchers = [
            xgr.GrammarMatcher(compiled_grammar)
            for _ in range(batch_size)
        ]

        next_batch_token_bitmask = xgr.allocate_token_bitmask(batch_size, tokenizer.vocab_size)
        
        # warm up
        xgr.GrammarMatcher.batch_fill_next_token_bitmask(matchers, next_batch_token_bitmask)
        
        start_time = time.time_ns()
        xgr.GrammarMatcher.batch_fill_next_token_bitmask(matchers, next_batch_token_bitmask)
        end_time = time.time_ns()
        elapsed_time = end_time - start_time
        if batch_size not in batch_time_list:
            batch_time_list[batch_size] = []
        batch_time_list[batch_size].append(elapsed_time)
        
# Draw results
naive_avg_time = [np.mean(naive_time_list[bs]) for bs in batch_sizes]
single_times = []
for bs in batch_sizes:
    single_times.extend([t / bs for t in naive_time_list[bs]])
naive_avg_time.insert(0, np.mean(single_times))
batch_avg_time = [np.mean(batch_time_list[bs]) for bs in batch_sizes]
single_times = []
for bs in batch_sizes:
    single_times.extend([t / bs for t in batch_time_list[bs]])
batch_avg_time.insert(0, np.mean(single_times))
    
x = np.arange(len(batch_sizes) + 1)
width = 0.2 
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, naive_avg_time, width, label='Naive')
rects2 = ax.bar(x + width/2, batch_avg_time, width, label='Batch')
ax.set_ylabel('Time (ns)')
ax.set_xlabel('Batch Size')
ax.set_title('Time by Batch Size and Method')
ax.set_xticks(x)

ax.set_xticklabels(["avg"] + batch_sizes)
ax.legend()
ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=-3)
ax.set_yscale('log')
plt.savefig("batch_vs_naive.png")
plt.show()

Signed-off-by: Yuchuan <[email protected]> refactor nanobind. Signed-off-by: Yuchuan <[email protected]> refactor python files. Signed-off-by: Yuchuan <[email protected]> finish. Signed-off-by: Yuchuan <[email protected]> fix batch filling. Signed-off-by: Yuchuan <[email protected]>

Seven-Streams added 13 commits September 28, 2025 14:57

add the apis.

2839185

Signed-off-by: Yuchuan <[email protected]>

finitsh the basic frame.

66429af

Signed-off-by: Yuchuan <[email protected]>

finish.

e2c4112

Signed-off-by: Yuchuan <[email protected]>

finish the basic tests for accepting.

bf06612

Signed-off-by: Yuchuan <[email protected]>

fix.

e75f61c

Signed-off-by: Yuchuan <[email protected]>

try to fix.

99abdb6

Signed-off-by: Yuchuan <[email protected]>

fix.

93e06fd

Signed-off-by: Yuchuan <[email protected]>

refactor.

0b7efc6

Signed-off-by: Yuchuan <[email protected]>

format.

8a32e9c

Signed-off-by: Yuchuan <[email protected]>

more tests.

7b7be49

Signed-off-by: Yuchuan <[email protected]>

finish testing.

6a94a5c

Signed-off-by: Yuchuan <[email protected]>

test.

c6c7f10

Signed-off-by: Yuchuan <[email protected]>

format.

2df1e8a

Signed-off-by: Yuchuan <[email protected]>

Ubospica reviewed Sep 29, 2025

View reviewed changes

Seven-Streams added 8 commits September 30, 2025 09:24

format.

8adc1d2

Signed-off-by: Yuchuan <[email protected]>

change the return value.

0664c3d

Signed-off-by: Yuchuan <[email protected]>

refactor.

28d46ce

Signed-off-by: Yuchuan <[email protected]>

finish refactor.

be32a0b

Signed-off-by: Yuchuan <[email protected]>

fix tests.

b60f487

Signed-off-by: Yuchuan <[email protected]>

test utf8 and bytes.

ca22ac1

Signed-off-by: Yuchuan <[email protected]>

add tests.

3212137

Signed-off-by: Yuchuan <[email protected]>

doc.

6146895

Signed-off-by: Yuchuan <[email protected]>

Seven-Streams marked this pull request as draft September 30, 2025 03:08

Seven-Streams added 3 commits September 30, 2025 11:09

fix.

9ca3fa2

Signed-off-by: Yuchuan <[email protected]>

format.

c1f81f6

Signed-off-by: Yuchuan <[email protected]>

format.

570284c

Signed-off-by: Yuchuan <[email protected]>

Seven-Streams marked this pull request as ready for review September 30, 2025 04:12

Ubospica reviewed Oct 1, 2025

View reviewed changes

Update python/xgrammar/matcher.py

9b5f727

Update docs/xgrammar_features/runtime_safeguards.md

cd29191

Seven-Streams added 3 commits October 2, 2025 10:54

format.

0f1b53d

Signed-off-by: Yuchuan <[email protected]>

refactor batch_fill_next_bitmask.

13e0fc8

Signed-off-by: Yuchuan <[email protected]>

refactor.

e9e7f09

Signed-off-by: Yuchuan <[email protected]>

Ubospica approved these changes Oct 3, 2025

View reviewed changes

Seven-Streams force-pushed the main-dev/2025-09-28/batch branch from 2a63ee5 to 6374909 Compare October 3, 2025 02:02

Ubospica approved these changes Oct 8, 2025

View reviewed changes

Seven-Streams merged commit 01678b1 into mlc-ai:main Oct 8, 2025
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Provide batched fill next token mask and batched accept string/token methods. #445

[Feature] Provide batched fill next token mask and batched accept string/token methods. #445

Uh oh!

Seven-Streams commented Sep 29, 2025 •

edited

Loading

Uh oh!

Ubospica left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Seven-Streams commented Sep 30, 2025

Uh oh!

Ubospica left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Seven-Streams commented Oct 2, 2025

Uh oh!

Seven-Streams commented Oct 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature] Provide batched fill next token mask and batched accept string/token methods. #445

[Feature] Provide batched fill next token mask and batched accept string/token methods. #445

Uh oh!

Conversation

Seven-Streams commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ubospica left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Seven-Streams commented Sep 30, 2025

Uh oh!

Ubospica left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Seven-Streams commented Oct 2, 2025

Uh oh!

Seven-Streams commented Oct 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Seven-Streams commented Sep 29, 2025 •

edited

Loading

Ubospica left a comment •

edited

Loading

Ubospica left a comment •

edited

Loading