Add model registration and qualification handoff by Thump604 · Pull Request #425 · waybarrios/vllm-mlx

Thump604 · 2026-04-24T19:19:10Z

Stacked on #417. This keeps the first model artifact workflow PR reviewable, then adds the next handoff layer without mutating any local Ops registry.

Scope:

add vllm-mlx model register <artifact> to write a portable vllm_mlx_registration_manifest.json
include artifact inspection, acquisition/conversion source manifests, parser policy, serving defaults, feature flags, preset alias, and explicit production_ready=false
add vllm-mlx model qualify <model-id> to create or run a bench-serve qualification handoff command and optional request manifest
keep qualification as a handoff artifact, not a green-light. Production readiness still requires the standing workload contract and review of the resulting evidence

Local validation:

uv run --extra dev pytest -q tests/test_model_workflow.py tests/test_download.py
# 23 passed

uv run --extra dev black --check vllm_mlx/model_workflow.py vllm_mlx/cli.py tests/test_model_workflow.py
uvx ruff check vllm_mlx/model_workflow.py vllm_mlx/cli.py tests/test_model_workflow.py --select E,F,W --ignore E402,E501,E731,F811,F841
git diff --check
# clean

CLI smokes:

vllm-mlx model qualify qwen-test --workload /tmp/workload.json --dry-run --extra-arg=--tag --extra-arg smoke
vllm-mlx model register <tmp-artifact> --model-id qwen-test --served-model-name qwen-test --preset-alias fast-qwen --mllm --default-temperature 0.6 --default-top-p 0.95 --default-top-k 20 --default-min-p 0.0 --default-presence-penalty 0.0 --default-repetition-penalty 1.0 --default-chat-template-kwargs '{"enable_thinking":true}' --feature-flag prefix_cache

Dependency note: model qualify --workload is intended to compose with #406's workload contract support. Without that bench-serve support, the dry-run manifest is still useful, but running the generated command with --workload depends on #406 or equivalent landing.

Thump604 · 2026-04-25T14:15:41Z

Would appreciate your review on this when you have a chance. Happy to address any feedback.

janhilgard · 2026-04-25T16:11:39Z

@Thump604 Thanks for this — the registration/qualification handoff pattern is well-structured and the frozen dataclasses are clean. A few things:

Blocker

1. Base branch (#417) is CLOSED

This PR targets feat/model-acquisition-workflow (PR #417), which has been closed. PR #423 appears to be the replacement (same title, targets main). This PR has no merge path — it needs to be retargeted to either main (with #423's changes squashed in) or to #423's branch.

Should fix

2. No --no-mllm flag

--mllm uses action="store_true" with default=None, so args.mllm is either True or None — never False. There's no way to explicitly mark a model as not MLLM. If someone re-registers a model, they can't undo the flag. Consider adding --no-mllm or switching to --mllm true/false.

3. Missing test: register_model with minimal/default options

No test covers register_model when only the artifact path is provided (everything else defaults). This would verify model_id defaults to artifact.name, serving_defaults is {}, feature_flags is [], etc.

4. Missing test: qualification success path

test_qualify_model_runs_command_and_records_failure covers returncode=1, but the returncode=0 / status="succeeded" path is untested.

5. Missing test: NotADirectoryError path

register_model raises NotADirectoryError if the artifact is a file, but no test covers this.

6. Add --help text to arguments

About 10+ arguments on model register and model qualify have no help= text: --model-id, --served-model-name, --preset-alias, --tool-call-parser, --reasoning-parser, all --default-* params, --repetitions, etc.

7. Assert zero-valued defaults survive _drop_none

The test passes default_min_p=0.0 and default_presence_penalty=0.0 but never asserts these appear in serving_defaults. One assertion would confirm _drop_none handles zeros correctly (it does, since it only strips None, but the test should prove it).

Consider

8. artifact_path is absolute — limits portability

The PR description says "portable registration manifest" but the embedded artifact_path is machine-specific (str(artifact.resolve())). Same for source_manifests. Downstream consumers would need to resolve paths relative to the manifest location. Worth documenting this limitation.

9. No --timeout on qualify

subprocess.run has no timeout — a hung server means the CLI hangs indefinitely. For interactive use this is fine (Ctrl-C), but a --timeout option would help in CI/automated contexts.

10. extra_args can silently override earlier flags

--extra-arg=--output would conflict with the --result-output that already set --output on the command. No validation or documentation about this.

11. No environment metadata in qualification manifest

The qualification manifest records command, stdout, stderr but not Python version, vllm-mlx version, or MLX version. This would help reproducibility — "these numbers were produced with version X on hardware Y."

12. Schema evolution

Both manifests include schema_version: 1 which is forward-looking, but _existing_manifests reads whatever JSON is present without checking the version. Fine for now, but worth noting for when version 2 arrives.

Minor

feature_flags: list[str] | None = None in RegistrationOptions but CLI default is [] — options.feature_flags or [] treats both None and [] the same way. Using if options.feature_flags is not None else [] would be clearer.
stdout from bench-serve is likely JSON, so the manifest has JSON-as-string inside JSON. Standard but worth noting for consumers.
production_ready=false and qualification_required=true are always hardcoded with no mechanism to update after qualification passes. This is deliberate (handoff artifact, not green-light), but consider documenting the intended workflow for how production_ready eventually gets flipped.

janhilgard

Blocker

Base branch (#417) is CLOSED — this PR targets feat/model-acquisition-workflow (PR #417), which has been closed. PR #423 appears to be the replacement. This PR has no merge path.

Should fix

No --no-mllm flag — args.mllm is either True or None, never False. No way to unset it.
Missing test: register_model with minimal/default options
Missing test: qualification success path (returncode=0)
Missing test: NotADirectoryError path
~10 arguments missing help= text
_drop_none handling of zero-valued defaults untested

Consider

artifact_path is absolute — limits portability despite "portable manifest" framing
No --timeout on qualify — hung server blocks CLI indefinitely
extra_args can silently override earlier flags
No environment metadata in qualification manifest (Python/vllm-mlx/MLX version)

See full review comment for details.

Thump604 · 2026-04-25T16:41:37Z

@janhilgard Verified all items. Here is where I landed on each:

Blocker (1): Correct -- #417 is closed and #423 replaces it. I will re-port #425 onto main with #423's content included. New branch, new PR referencing this one.

Should fix (2-7): All valid. Fixes:

Add --no-mllm via store_false counterpart
Add tests for: minimal register (defaults only), qualification success path (returncode=0), NotADirectoryError
Add help= text to all 10 bare arguments
Assert 0.0-valued defaults survive _drop_none in test

Consider (8-11): Agreed on 8 (will document the absolute-path limitation), 9 (add --timeout), and 11 (add Python/vllm-mlx/MLX version to qualification manifest). For 10 (extra_args override), I will add a note in help text rather than validation -- the flexibility is intentional for forward-compat with new bench-serve flags.

Will address in the re-ported PR.

janhilgard · 2026-04-25T16:45:41Z

@Thump604 Sounds good on all points. The extra_args help-text note is a reasonable approach — keeps the flexibility while making the behavior visible.

Looking forward to the re-ported PR.

…gaps - Add --no-mllm as mutually exclusive counterpart to --mllm via argparse group, so mllm can be explicitly set to False - Add help= text to all register/qualify/convert arguments that were missing it (~10 arguments) - Add test: register_model with minimal defaults (only artifact_path) - Add test: NotADirectoryError when artifact is a file - Add test: qualification success path (returncode=0) - Add test: _drop_none preserves 0, 0.0, and False (only drops None)

Thump604 · 2026-04-28T13:44:10Z

Rebased onto current main (now that #423 is merged — base branch blocker resolved). All review feedback from the first round is addressed:

--no-mllm flag: Added as mutually exclusive counterpart to --mllm via argparse group
Missing help text: Added to all ~10 arguments that were missing it
Test gaps filled: register_model with minimal defaults, NotADirectoryError path, qualification success path (returncode=0), _drop_none preserves 0/0.0/False
Merge conflicts: Resolved — new tests from Add model artifact workflow CLI #423 (convert failure, GPTQ detection) are preserved alongside register/qualify tests

17/17 tests passing.

janhilgard

Clean addition to the model workflow. Well-tested (10 new tests covering happy path, error cases, dry-run, and subprocess results). CLI UX is solid with mutually exclusive --mllm/--no-mllm and repeatable flags. Code follows established patterns from the existing model_workflow module. LGTM.

Thump604 mentioned this pull request Apr 24, 2026

Add bench-serve workload contracts #406

Closed

Thump604 requested a review from janhilgard April 25, 2026 14:15

janhilgard requested changes Apr 25, 2026

View reviewed changes

Thump604 mentioned this pull request Apr 25, 2026

Add model artifact workflow with registration and qualification #437

Closed

6 tasks

Thump604 force-pushed the feat/model-register-qualify-workflow branch from f933a50 to 4053fe3 Compare April 26, 2026 18:26

Thump604 changed the base branch from feat/model-acquisition-workflow to main April 26, 2026 18:26

Thump604 added 2 commits April 28, 2026 08:43

Add model registration and qualification handoff

9e75092

Thump604 force-pushed the feat/model-register-qualify-workflow branch from 4053fe3 to 293222a Compare April 28, 2026 13:44

janhilgard approved these changes Apr 28, 2026

View reviewed changes

Thump604 marked this pull request as ready for review April 28, 2026 14:36

Thump604 merged commit 270cd0b into main Apr 28, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model registration and qualification handoff#425

Add model registration and qualification handoff#425
Thump604 merged 2 commits intomainfrom
feat/model-register-qualify-workflow

Thump604 commented Apr 24, 2026

Uh oh!

Thump604 commented Apr 25, 2026

Uh oh!

janhilgard commented Apr 25, 2026

Uh oh!

janhilgard left a comment

Uh oh!

Thump604 commented Apr 25, 2026

Uh oh!

janhilgard commented Apr 25, 2026

Uh oh!

Thump604 commented Apr 28, 2026

Uh oh!

janhilgard left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Thump604 commented Apr 24, 2026

Uh oh!

Thump604 commented Apr 25, 2026

Uh oh!

janhilgard commented Apr 25, 2026

Blocker

Should fix

Consider

Minor

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Blocker

Should fix

Consider

Uh oh!

Thump604 commented Apr 25, 2026

Uh oh!

janhilgard commented Apr 25, 2026

Uh oh!

Thump604 commented Apr 28, 2026

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants