Skip to content

Conversation

tdoublep
Copy link
Member

@tdoublep tdoublep commented Mar 25, 2025

I need to do a demo using V0 online serving today, and I noticed that it is currently not working using latest aiu-vllm-dev image:

Trying to deploy the following:

python3 -m vllm.entrypoints.openai.api_server --model /models/llama-194m/ --max-model-len=2048 --block-size=128

produces:

[SpyreWorker] load model...
ERROR 03-25 09:16:21 [engine.py:448] type object 'SpyrePlatform' has no attribute 'spyre_warmup_shapes'
ERROR 03-25 09:16:21 [engine.py:448] Traceback (most recent call last):
ERROR 03-25 09:16:21 [engine.py:448]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine
ERROR 03-25 09:16:21 [engine.py:448]     engine = MQLLMEngine.from_vllm_config(
ERROR 03-25 09:16:21 [engine.py:448]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-25 09:16:21 [engine.py:448]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config
ERROR 03-25 09:16:21 [engine.py:448]     return cls(
ERROR 03-25 09:16:21 [engine.py:448]            ^^^^
ERROR 03-25 09:16:21 [engine.py:448]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__
ERROR 03-25 09:16:21 [engine.py:448]     self.engine = LLMEngine(*args, **kwargs)
ERROR 03-25 09:16:21 [engine.py:448]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-25 09:16:21 [engine.py:448]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 280, in __init__
ERROR 03-25 09:16:21 [engine.py:448]     self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 03-25 09:16:21 [engine.py:448]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-25 09:16:21 [engine.py:448]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 03-25 09:16:21 [engine.py:448]     self._init_executor()
ERROR 03-25 09:16:21 [engine.py:448]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
ERROR 03-25 09:16:21 [engine.py:448]     self.collective_rpc("load_model")
ERROR 03-25 09:16:21 [engine.py:448]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 03-25 09:16:21 [engine.py:448]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 03-25 09:16:21 [engine.py:448]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-25 09:16:21 [engine.py:448]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/utils.py", line 2216, in run_method
ERROR 03-25 09:16:21 [engine.py:448]     return func(*args, **kwargs)
ERROR 03-25 09:16:21 [engine.py:448]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-25 09:16:21 [engine.py:448]   File "/opt/vllm/lib64/python3.11/site-packages/vllm_spyre/worker/spyre_worker.py", line 149, in load_model
ERROR 03-25 09:16:21 [engine.py:448]     spyre_warmup_shapes = current_platform.get_warmup_shapes()
ERROR 03-25 09:16:21 [engine.py:448]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-25 09:16:21 [engine.py:448]   File "/opt/vllm/lib64/python3.11/site-packages/vllm_spyre/platform.py", line 167, in get_warmup_shapes
ERROR 03-25 09:16:21 [engine.py:448]     return cls.spyre_warmup_shapes
ERROR 03-25 09:16:21 [engine.py:448]            ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-25 09:16:21 [engine.py:448] AttributeError: type object 'SpyrePlatform' has no attribute 'spyre_warmup_shapes'

It looks like the engine process is getting forked after we parse the warmup shapes in the main process, and thus in the engine process they don't exist.

It doesn't look like the platform is really the best place to store "state". Why don't we just use the config instead? I understand it is a little bit nasty since we can't "change" that class in the plugin, but it works very nicely.

@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes:

pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Thomas Parnell <[email protected]>
@tdoublep tdoublep force-pushed the tpa-fix-v0-warmup branch from 9973a7c to 5b95b15 Compare March 25, 2025 11:10
@maxdebayser
Copy link
Collaborator

So the warmup shapes are being parsed in the main server process while the first worker process that also contains the engine in V0 is already up and running?

Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
Copy link
Collaborator

@yannicks1 yannicks1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just fixed formatting and added the same logic to V1 classes. LGTM!

@tjohnson31415
Copy link
Collaborator

So the warmup shapes are being parsed in the main server process while the first worker process that also contains the engine in V0 is already up and running?

No, the worker process comes up after the main server process parses the warmup shapes, but the shapes aren't re-parsed in the worker. This problem is related to VLLM_WORKER_MULTIPROC_METHOD. For the initial v1 integration @joerunde put in code to force using VLLM_WORKER_MULTIPROC_METHOD=fork:
https://github.com/vllm-project/vllm-spyre/pull/6/files#diff-16ac04c4e75668ccde20cd2cfb82fa496d5d98dbe169ffede15652dc60a16066R54-R62
(this code could be removed now in this PR)

The default of spawn meant that the spyre_warmup_shapes added to the SpyrePlatform class instance did not exist in the spawned worker (the worker process doesn't call set_warmup_shapes). With fork the modifications to the class persist in the worker.

Moving the warmup shapes to the scheduler_config works because the vllm_config is serialized to the worker class during initialization regardless of spawn vs fork.

@tdoublep
Copy link
Member Author

@tjohnson31415 thank you for the correction + detailed explanation! I wrongly assumed that the default V0 behaviour was fork too. I will try it quickly now on main using fork.

Either way, I still think we should consider moving the shapes into the config. In fact, I wonder why we are really using env variables at all. Wouldn't we rather be passing them as proper arguments? If vLLM does not provide a way for plugins to add their own custom args/config then perhaps this is something we could change upstream?

@tdoublep
Copy link
Member Author

Hmm, I get the same error even if I set:

export VLLM_WORKER_MULTIPROC_METHOD=fork

@tjohnson31415
Copy link
Collaborator

Hmm, yeah, then this may be something else, not the serialization problem that I'm familiar with 🤔

@joerunde
Copy link
Collaborator

If vLLM does not provide a way for plugins to add their own custom args/config then perhaps this is something we could change upstream?

+1 on bringing that up upstream. I'd rather not have to do the entrypoint hijacking that the vllm tgis adapter does to add cli args.

For this PR though, do we need to store the warmup shapes on a config object at all? It seems like they can be rebuilt in each worker from the environment variables, and I'd prefer doing that now for robustness until we work out some upstream changes for plugins to be able to add their own arguments and config properly.

@joerunde
Copy link
Collaborator

Also.... we should probably have at least a single test that runs the openai server so we can catch these problems too!

@tdoublep
Copy link
Member Author

Also.... we should probably have at least a single test that runs the openai server so we can catch these problems too!

Agreed. @dpatel-ops and I had discussed this before.

@tdoublep
Copy link
Member Author

For this PR though, do we need to store the warmup shapes on a config object at all? It seems like they can be rebuilt in each worker from the environment variables, and I'd prefer doing that now for robustness until we work out some upstream changes for plugins to be able to add their own arguments and config properly.

@joerunde We can do it like that for now, yeah. I had an earlier attempt where I was doing that, just felt that cleaning in the config was cleaner since they are kept in "one place". It is not a huge difference though tbh.

@yannicks1 yannicks1 mentioned this pull request Mar 26, 2025
@joerunde joerunde mentioned this pull request Mar 26, 2025
@joerunde
Copy link
Collaborator

Added a followup issue to correctly handle cli args and configs here: #51

Copy link
Member Author

@tdoublep tdoublep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@joerunde joerunde merged commit 15c0e52 into main Mar 26, 2025
9 checks passed
@joerunde joerunde deleted the tpa-fix-v0-warmup branch March 26, 2025 18:07
joerunde pushed a commit that referenced this pull request Jun 25, 2025
### [v0] replace current_platform with SpyrePlatform

PR #47 missed replacing current_platform with SpyrePlatform in the v0
model runner. I don't think this is an issue or related to the recent
failures of v0 static batching on AIU Spyre, just add it here for
completeness.

Signed-off-by: Yannick Schnider <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants