fix: graceful fallback when model has no chat_template (MedGemma) by jackneil · Pull Request #271 · waybarrios/vllm-mlx

jackneil · 2026-04-09T18:32:14Z

Summary

Models like MedGemma have apply_chat_template() inherited but no chat_template configured
Previously crashed with ValueError: Cannot use apply_chat_template when no chat_template is set
Server returned HTTP 200 then dropped the connection mid-stream
Now catches ValueError and falls back to plain-text prompt format in both BatchedEngine and SimpleEngine

Test plan

4 unit tests verifying fallback behavior
No regressions in existing tests (112 passing)
End-to-end with MedGemma model

🤖 Generated with Claude Code

Models like MedGemma have apply_chat_template() as an inherited method but no chat_template configured, causing ValueError on every request. Now catches ValueError and falls back to plain-text prompt format in both BatchedEngine and SimpleEngine paths. Fixes: MedGemma crashes with "Cannot use apply_chat_template" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Thump604 · 2026-04-11T16:17:36Z

One correctness concern before merge: the new fallback paths catch broad TypeError / ValueError from apply_chat_template() and silently drop to plain-text prompts.

That is fine for the specific MedGemma case (no chat_template configured), but it also masks unrelated bugs like template argument mismatches or other real template failures. In those cases we would silently change behavior instead of surfacing the regression.

I think the fallback should be narrowed to the known missing-template case only — e.g. catch ValueError with the expected "no chat_template" / "Cannot use apply_chat_template when no chat_template is set" class of message — and let other exceptions surface normally.

Thump604

I rechecked this against current main, and I do not think it is merge-ready in its current shape.

Two blockers:

The fallback is still too broad. Catching generic TypeError / ValueError from apply_chat_template() will silently mask unrelated template regressions. This should be narrowed to the known missing-template case only (the no chat_template configured / Cannot use apply_chat_template when no chat_template is set class of failure).
More importantly, this branch does not actually cover the real SimpleEngine.chat() crash path on current main. The non-streaming simple-engine path still delegates straight into self._model.chat(...) before any of this local fallback logic runs, so a model/processor with apply_chat_template() present but no configured chat_template can still fail before reaching the new fallback code. The current diff mainly helps the local prompt-building/accounting paths, not the full execution path.

Given that the branch is also conflicting, I would rather see a small current-main replacement that fixes the real execution path and adds a regression against that path specifically.

janhilgard

Agreeing with Thump604's review -- the broad TypeError/ValueError catch is the main concern.

A couple of additional notes:

simple.py token count estimation is lossy. When the fallback triggers, prompt_token_count is estimated as len(content) // 4. This is a rough heuristic that can be significantly off for non-English text or short prompts. The estimate propagates into the usage field of the API response, which callers may rely on for billing or context window management. Consider at least logging a warning when this fallback path is taken, so users know the token count is approximate.
Inconsistent fallback between engines. In batched.py, the fallback catches ValueError from the outer try (the hasattr(template_applicator, 'apply_chat_template') path) and also adds a new inner except (TypeError, ValueError): pass for the tools-stripped retry. But in simple.py, only apply_chat_template is wrapped. If MedGemma is used with the batched engine, it hits a different code path than with the simple engine. It would be good to unify the fallback logic into a shared helper rather than duplicating it.
Tests use object.__new__(BatchedEngine) to create a stub. This is brittle -- if BatchedEngine.__init__ ever adds required state (e.g., a logger, a lock), the tests will break in confusing ways. A lightweight mock or extracting _apply_chat_template into a standalone function would be more robust.
Branch has merge conflicts with current main. Needs a rebase.

The fix itself is needed (MedGemma and similar template-less models should not crash the server), but I agree with Thump604 that narrowing the catch to the specific "no chat_template configured" error message is important to avoid masking real bugs.

Thump604 · 2026-04-29T22:32:18Z

Hi @jackneil -- this PR has review feedback from April and merge conflicts with current main. Are you still working on it? Happy to review a rebased version. If the work is stalled, we can close it and you're welcome to refile when ready. Will check back in two weeks.

Thump604 requested changes Apr 11, 2026

View reviewed changes

janhilgard reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: graceful fallback when model has no chat_template (MedGemma)#271

fix: graceful fallback when model has no chat_template (MedGemma)#271
jackneil wants to merge 1 commit intowaybarrios:mainfrom
jackneil:pr/medgemma-chat-template

jackneil commented Apr 9, 2026

Uh oh!

Thump604 commented Apr 11, 2026

Uh oh!

Thump604 left a comment

Uh oh!

janhilgard left a comment

Uh oh!

Thump604 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jackneil commented Apr 9, 2026

Summary

Test plan

Uh oh!

Thump604 commented Apr 11, 2026

Uh oh!

Thump604 left a comment

Choose a reason for hiding this comment

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Uh oh!

Thump604 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants