Skip to content

Add model registration and qualification handoff#425

Merged
Thump604 merged 2 commits intomainfrom
feat/model-register-qualify-workflow
Apr 28, 2026
Merged

Add model registration and qualification handoff#425
Thump604 merged 2 commits intomainfrom
feat/model-register-qualify-workflow

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Stacked on #417. This keeps the first model artifact workflow PR reviewable, then adds the next handoff layer without mutating any local Ops registry.

Scope:

  • add vllm-mlx model register <artifact> to write a portable vllm_mlx_registration_manifest.json
  • include artifact inspection, acquisition/conversion source manifests, parser policy, serving defaults, feature flags, preset alias, and explicit production_ready=false
  • add vllm-mlx model qualify <model-id> to create or run a bench-serve qualification handoff command and optional request manifest
  • keep qualification as a handoff artifact, not a green-light. Production readiness still requires the standing workload contract and review of the resulting evidence

Local validation:

uv run --extra dev pytest -q tests/test_model_workflow.py tests/test_download.py
# 23 passed

uv run --extra dev black --check vllm_mlx/model_workflow.py vllm_mlx/cli.py tests/test_model_workflow.py
uvx ruff check vllm_mlx/model_workflow.py vllm_mlx/cli.py tests/test_model_workflow.py --select E,F,W --ignore E402,E501,E731,F811,F841
git diff --check
# clean

CLI smokes:

vllm-mlx model qualify qwen-test --workload /tmp/workload.json --dry-run --extra-arg=--tag --extra-arg smoke
vllm-mlx model register <tmp-artifact> --model-id qwen-test --served-model-name qwen-test --preset-alias fast-qwen --mllm --default-temperature 0.6 --default-top-p 0.95 --default-top-k 20 --default-min-p 0.0 --default-presence-penalty 0.0 --default-repetition-penalty 1.0 --default-chat-template-kwargs '{"enable_thinking":true}' --feature-flag prefix_cache

Dependency note: model qualify --workload is intended to compose with #406's workload contract support. Without that bench-serve support, the dry-run manifest is still useful, but running the generated command with --workload depends on #406 or equivalent landing.

@Thump604
Copy link
Copy Markdown
Collaborator Author

Would appreciate your review on this when you have a chance. Happy to address any feedback.

@janhilgard
Copy link
Copy Markdown
Collaborator

@Thump604 Thanks for this — the registration/qualification handoff pattern is well-structured and the frozen dataclasses are clean. A few things:

Blocker

1. Base branch (#417) is CLOSED

This PR targets feat/model-acquisition-workflow (PR #417), which has been closed. PR #423 appears to be the replacement (same title, targets main). This PR has no merge path — it needs to be retargeted to either main (with #423's changes squashed in) or to #423's branch.

Should fix

2. No --no-mllm flag

--mllm uses action="store_true" with default=None, so args.mllm is either True or None — never False. There's no way to explicitly mark a model as not MLLM. If someone re-registers a model, they can't undo the flag. Consider adding --no-mllm or switching to --mllm true/false.

3. Missing test: register_model with minimal/default options

No test covers register_model when only the artifact path is provided (everything else defaults). This would verify model_id defaults to artifact.name, serving_defaults is {}, feature_flags is [], etc.

4. Missing test: qualification success path

test_qualify_model_runs_command_and_records_failure covers returncode=1, but the returncode=0 / status="succeeded" path is untested.

5. Missing test: NotADirectoryError path

register_model raises NotADirectoryError if the artifact is a file, but no test covers this.

6. Add --help text to arguments

About 10+ arguments on model register and model qualify have no help= text: --model-id, --served-model-name, --preset-alias, --tool-call-parser, --reasoning-parser, all --default-* params, --repetitions, etc.

7. Assert zero-valued defaults survive _drop_none

The test passes default_min_p=0.0 and default_presence_penalty=0.0 but never asserts these appear in serving_defaults. One assertion would confirm _drop_none handles zeros correctly (it does, since it only strips None, but the test should prove it).

Consider

8. artifact_path is absolute — limits portability

The PR description says "portable registration manifest" but the embedded artifact_path is machine-specific (str(artifact.resolve())). Same for source_manifests. Downstream consumers would need to resolve paths relative to the manifest location. Worth documenting this limitation.

9. No --timeout on qualify

subprocess.run has no timeout — a hung server means the CLI hangs indefinitely. For interactive use this is fine (Ctrl-C), but a --timeout option would help in CI/automated contexts.

10. extra_args can silently override earlier flags

--extra-arg=--output would conflict with the --result-output that already set --output on the command. No validation or documentation about this.

11. No environment metadata in qualification manifest

The qualification manifest records command, stdout, stderr but not Python version, vllm-mlx version, or MLX version. This would help reproducibility — "these numbers were produced with version X on hardware Y."

12. Schema evolution

Both manifests include schema_version: 1 which is forward-looking, but _existing_manifests reads whatever JSON is present without checking the version. Fine for now, but worth noting for when version 2 arrives.

Minor

  • feature_flags: list[str] | None = None in RegistrationOptions but CLI default is []options.feature_flags or [] treats both None and [] the same way. Using if options.feature_flags is not None else [] would be clearer.
  • stdout from bench-serve is likely JSON, so the manifest has JSON-as-string inside JSON. Standard but worth noting for consumers.
  • production_ready=false and qualification_required=true are always hardcoded with no mechanism to update after qualification passes. This is deliberate (handoff artifact, not green-light), but consider documenting the intended workflow for how production_ready eventually gets flipped.

Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker

  1. Base branch (#417) is CLOSED — this PR targets feat/model-acquisition-workflow (PR #417), which has been closed. PR #423 appears to be the replacement. This PR has no merge path.

Should fix

  1. No --no-mllm flagargs.mllm is either True or None, never False. No way to unset it.
  2. Missing test: register_model with minimal/default options
  3. Missing test: qualification success path (returncode=0)
  4. Missing test: NotADirectoryError path
  5. ~10 arguments missing help= text
  6. _drop_none handling of zero-valued defaults untested

Consider

  1. artifact_path is absolute — limits portability despite "portable manifest" framing
  2. No --timeout on qualify — hung server blocks CLI indefinitely
  3. extra_args can silently override earlier flags
  4. No environment metadata in qualification manifest (Python/vllm-mlx/MLX version)

See full review comment for details.

@Thump604
Copy link
Copy Markdown
Collaborator Author

@janhilgard Verified all items. Here is where I landed on each:

Blocker (1): Correct -- #417 is closed and #423 replaces it. I will re-port #425 onto main with #423's content included. New branch, new PR referencing this one.

Should fix (2-7): All valid. Fixes:

  • Add --no-mllm via store_false counterpart
  • Add tests for: minimal register (defaults only), qualification success path (returncode=0), NotADirectoryError
  • Add help= text to all 10 bare arguments
  • Assert 0.0-valued defaults survive _drop_none in test

Consider (8-11): Agreed on 8 (will document the absolute-path limitation), 9 (add --timeout), and 11 (add Python/vllm-mlx/MLX version to qualification manifest). For 10 (extra_args override), I will add a note in help text rather than validation -- the flexibility is intentional for forward-compat with new bench-serve flags.

Will address in the re-ported PR.

@janhilgard
Copy link
Copy Markdown
Collaborator

@Thump604 Sounds good on all points. The extra_args help-text note is a reasonable approach — keeps the flexibility while making the behavior visible.

Looking forward to the re-ported PR.

@Thump604 Thump604 force-pushed the feat/model-register-qualify-workflow branch from f933a50 to 4053fe3 Compare April 26, 2026 18:26
@Thump604 Thump604 changed the base branch from feat/model-acquisition-workflow to main April 26, 2026 18:26
…gaps

- Add --no-mllm as mutually exclusive counterpart to --mllm via
  argparse group, so mllm can be explicitly set to False
- Add help= text to all register/qualify/convert arguments that
  were missing it (~10 arguments)
- Add test: register_model with minimal defaults (only artifact_path)
- Add test: NotADirectoryError when artifact is a file
- Add test: qualification success path (returncode=0)
- Add test: _drop_none preserves 0, 0.0, and False (only drops None)
@Thump604 Thump604 force-pushed the feat/model-register-qualify-workflow branch from 4053fe3 to 293222a Compare April 28, 2026 13:44
@Thump604
Copy link
Copy Markdown
Collaborator Author

Rebased onto current main (now that #423 is merged — base branch blocker resolved). All review feedback from the first round is addressed:

  • --no-mllm flag: Added as mutually exclusive counterpart to --mllm via argparse group
  • Missing help text: Added to all ~10 arguments that were missing it
  • Test gaps filled: register_model with minimal defaults, NotADirectoryError path, qualification success path (returncode=0), _drop_none preserves 0/0.0/False
  • Merge conflicts: Resolved — new tests from Add model artifact workflow CLI #423 (convert failure, GPTQ detection) are preserved alongside register/qualify tests

17/17 tests passing.

Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean addition to the model workflow. Well-tested (10 new tests covering happy path, error cases, dry-run, and subprocess results). CLI UX is solid with mutually exclusive --mllm/--no-mllm and repeatable flags. Code follows established patterns from the existing model_workflow module. LGTM.

@Thump604 Thump604 marked this pull request as ready for review April 28, 2026 14:36
@Thump604 Thump604 merged commit 270cd0b into main Apr 28, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants