Track admission-control invariant for serialized TextModel-direct routes#514
Track admission-control invariant for serialized TextModel-direct routes#514waybarrios wants to merge 1 commit intomainfrom
Conversation
Issue #495 documents a downstream P0 where text-only MLLM requests bypassed the MLLM scheduler and entered a serialized TextModel-direct generation path with a 120s wait bound, causing request pileup. Current main does not have the bug. This pins the invariant so any future change reintroducing the failure mode breaks the suite. - Document SimpleEngine._generation_lock as Metal-serialization only, with a #495 reference, so future readers see the admission-control contract before repurposing the lock. - Add tests/test_textmodel_direct_admission_invariant.py with five static tripwires: no TextModelDirect identifiers, lock comment mentions Metal + #495, no asyncio.wait_for(..., timeout>=5s) in simple.py, no admission-queue attribute on SimpleEngine, and a future-proof shape check that fires the moment a text_generation_busy error is added. All four active tests pass on current main; the fifth skips by design until the corresponding error class lands. The tripwires were demoed against an inline-injected reproduction of the original bug; all four fail with actionable messages, then pass again once the regression is removed.
|
Hey @Thump604, quick context on what landed here in case it helps when you take a look. The PR is intentionally scoped as test plus comment only, no runtime logic was touched. On the source side, the only edit is the comment block above The new file is The reason for going test-only is that Happy to grow the scope if you'd rather see the functional half of #495 land in this PR too, meaning a real |
Closes #495.
Summary
Pins the admission-control invariant for serialized TextModel-direct
generation paths so any future change reintroducing the failure mode in
issue #495 breaks the test suite. Documents
SimpleEngine._generation_lockas Metal-serialization-only with a
#495reference, and adds a smallregression test module that enforces the invariant statically.
Background
Issue #495 documents a P0 hit by a downstream operator: text-only MLLM
requests bypassed the MLLM scheduler and entered a serialized
TextModel-direct generation path guarded by a single
asyncio.Lockwith a 120-second wait bound. Concurrent agent traffic piled up behind
that lock instead of receiving a fast retryable admission result.
Current
maindoes not have the bug. Text-only MLLM requests routethrough
MLLMScheduler, andSimpleEngine._generation_lockexistsonly to serialize Metal command-buffer access. The risk the issue
flags is regression: a future change repurposing the lock as a
wait-mode admission gate, or adding a TextModel-direct route without
fail-fast admission.
The acceptance criteria from #495:
route must fail fast rather than waiting for minutes.
text_generation_busy,with HTTP 503.
batching or queue absorption.
Changes
vllm_mlx/engine/simple.py— extends the existing comment on_generation_lockto spell out the invariant and reference #495 sothat future readers see the admission-control contract before
repurposing the lock:
tests/test_textmodel_direct_admission_invariant.py— new file with5 cases:
test_no_textmodel_direct_class_exists— scansvllm_mlx/for anyTextModelDirect-style identifier.test_generation_lock_is_documented_as_metal_only— reads thecomment block above
_generation_lockand asserts it mentionsMetal/command-buffer and references
#495.test_no_long_wait_for_in_simple_engine_text_paths— AST-walkssimple.pyand flags anyasyncio.wait_for(..., timeout=T)withT >= 5seconds. The original P0 usedtimeout=120.test_simple_engine_does_not_expose_admission_queue_attribute—inspects
SimpleEnginesource for forbidden attribute names like_text_admission_queue,_text_direct_lock, etc.test_text_generation_busy_error_if_present_is_machine_readable—walks the package for any
text_generation_busy-named symbol. Ifnone exists (today's state), the test skips; the moment someone
adds one, the test asserts HTTP 503 status and a machine-readable
code, locking in the contract from the issue.
Verification
The invariants the tests check, exercised against the live code:
Test run (clean state):
Demo: tests catch the exact regression from #495
To prove the tests aren't vacuous, I temporarily injected the precise
shape of the bug from the issue into
simple.py:All four tripwires fired at once with actionable messages:
After restoring the original file all four tests go back to PASSED.
Note on scope
This PR is test- and comment-only. No production logic changes. The
invariant is enforced by (a) the comment on
_generation_lockthatreferences #495, and (b) the static checks above. If a future PR
needs to reintroduce a serialized TextModel-direct route, the right
place to do so is alongside an update to these tests that reflects
the new fail-fast contract.