Skip to content

Expose MLLM MTP draft counters#504

Open
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:codex/mllm-mtp-metadata-accounting
Open

Expose MLLM MTP draft counters#504
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:codex/mllm-mtp-metadata-accounting

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

@Thump604 Thump604 commented May 6, 2026

Summary\n- track whether MLLM MTP actually attempted drafts on each primary response\n- carry attempted and accepted draft counters through MLLMScheduler outputs\n- expose those counters on BatchedEngine GenerationOutput for MLLM routes\n\n## Why\nThe MLLM MTP wrapper could previously make it hard to distinguish an installed MTP path from a step that actually drafted. That can make skipped speculative steps look like zero-acceptance MTP attempts. This PR records attempts only after the draft path actually runs, and counts accepted draft responses separately.\n\n## Tests\n- AI_RUNTIME_BYPASS_SAFETY_GATE=1 /opt/ai-runtime/venv-live/bin/python -m pytest -q tests/test_mllm_mtp_routing.py -q\n- uvx ruff check vllm_mlx/mllm_batch_generator.py vllm_mlx/mllm_scheduler.py vllm_mlx/engine/batched.py vllm_mlx/engine/base.py vllm_mlx/request.py tests/test_mllm_mtp_routing.py --select E,F,W --ignore E402,E501,E731,F811,F841\n- black --check --target-version py312 vllm_mlx/mllm_batch_generator.py vllm_mlx/mllm_scheduler.py vllm_mlx/engine/batched.py vllm_mlx/engine/base.py vllm_mlx/request.py tests/test_mllm_mtp_routing.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant