Open
Description
Proposal to improve performance
No response
Report of performance regression
No response
Misc discussion on performance
See this report at xgrammar issue.
-
env:
- vllm:
0.9.1
- xgrammar:
0.1.19
- vllm:
-
Structured json schema:
class Table(BaseModel):
content: conlist(conlist(str, min_length=5, max_length=10), min_length=5, max_length=10)
json_schema = Table.model_json_schema()
This schema leads to very low thoughput (< 3 tokens/s).
Is there any way to improve performance?
Your current environment (if you think it is necessary)
The output of `python collect_env.py`
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.