Skip to content

Conversation

ckadner
Copy link
Collaborator

@ckadner ckadner commented Sep 5, 2025

Description

  • Add config file for models and runtime parameters
  • Use configuration file in documentation
  • Add validation code to compare requested model and runtime parameters with supported configurations
  • Log warning when requested configuration is not supported
    Example: WARNING 09-05 18:46:43 [runtime_config_validator.py:107] The requested configuration is not supported for model 'ibm-ai-platform/micro-g3.3-8b-instruct-1b': RuntimeConfiguration(platform=, cb=True, tp_size=1, max_model_len=128, max_num_seqs=2, num_blocks=0, warmup_shapes=None)

TODO:

  • code cleanup
  • review/revise model-config YAML file structure
  • add a YAML field to ignore testing models/configurations for tiny model unit tests
  • what to use for num_blocks (cpu, gpu ...override)?
  • revise config validation logic and messaging
    • 2 stage config matching ... top level fields first, set containment for warmup_shapes second
  • update configs after release (candidate) testing
  • remove option to error out on unknown configuration
  • match models by config if they are mounted locally
  • integrate model/runtime configurations into tests (⚗️ draft supported model tests #435)
  • get_warmup_shapes_from_envs() does not yield same as platform.py:cls._warmup_shapes

Review suggestions:

I wonder if it's feasible to test the warm up shapes like this. Maybe we could do something like:

  • in the know configuration file, [only keep the] upper bound
  • Validate that the prompts are multiples of 64
  • Validate that prompt + new_tokens <= max_model_len
  • Validate that the batch size is <= a tested upper bound.

Related Issues

#435

- Add config file for models and runtime parameters
- Add validation code to compare requested model and
  runtime parameters with supported configurations
- Log warning when requested configuration is not
  supported
- Use configuration file in documentation

Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@ckadner ckadner marked this pull request as draft September 5, 2025 19:01
@vllm-project vllm-project deleted a comment from github-actions bot Sep 5, 2025
Signed-off-by: Christian Kadner <[email protected]>
@ckadner ckadner marked this pull request as draft September 30, 2025 19:48
Validate that the batch size is <= a tested upper bound

Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@ckadner ckadner marked this pull request as ready for review September 30, 2025 21:26
@ckadner ckadner changed the title WIP: Manage supported model configurations Manage supported model configurations Sep 30, 2025
@ckadner
Copy link
Collaborator Author

ckadner commented Oct 6, 2025

Hi @maxdebayser I added validation code and unit tests for:

  • in the know configuration file, [only keep the] upper bound
  • test [requested warmup_shapes against] upper bound for prompt length, batch size and max_new tokens
  • sum of prompt + max_new_tokens is smaller than the max_model_len [of supported configs]
  • prompt size is a multiple of 64

Kindly take another look? Thank you! 🙏🏻

@maxdebayser
Copy link
Collaborator

@ckadner , I was having trouble to explain my thoughts as review comments, so I put them in code form: ckadner#19 .

@maxdebayser
Copy link
Collaborator

@ckadner , my assumptions aren't correct. Please disregard some of my previous comments about upper bounds.

Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a small suggestion, but otherwise it LGTM

@ckadner
Copy link
Collaborator Author

ckadner commented Oct 8, 2025

One last item to do:


matching_models = [
model for model, config in (known_model_configs or {}).items()
if config.items() <= requested_config.items()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verification part of the code looks good to me and I like some of the clever tricks used for matching configurations. I have just one question: what is the motivation for maintaining the pre-downloaded config.json files? Is it to avoid downloads during testing or perhaps to make sure that the configurations are not updated remotely?

Because if you do something like

from transformers import AutoConfig
c = AutoConfig.from_reptrained("peiyi9979/math-shepherd-mistral-7b-prm")

the config.json will be downloaded automatically and cached in the local huggingface cache. And it will also use the config.json of already downloaded models.

@ckadner
Copy link
Collaborator Author

ckadner commented Oct 14, 2025

[...] question: what is the motivation for maintaining the pre-downloaded config.json files? Is it to avoid downloads during testing or perhaps to make sure that the configurations are not updated remotely?

I wanted to have one unit test which verifies that we can consistently match models by their config. So, in that unit test, I need to instantiate a model config for each of our supported models and then use those to verify my runtime configuration validation code can reliably "guess" the correct model. Which is needed to actually verify the runtime configurations.

Since we run our unit tests in "offline" mode in our GitHub Actions tests, we need to have those configs available before test execution.

We could add a pre-step in our GHA test workflow to download those configs (from HF or GHA cache) instead of keeping them in our unit tests folder. But @joerunde had already set that precedent of keeping 2 config.json files, and I like following precedent :-)

Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ckadner , that makes sense to me.

@ckadner ckadner merged commit f6a83ce into vllm-project:main Oct 15, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants