♻️ Use real logger #31

joerunde · 2025-03-17T18:53:16Z

I noticed that we weren't seeing some of the logger.warning messages that I had put into the v1 code, because we never initialized the logging package correctly for the plugin.

This PR reconfigures the logging package on import, and copies the vllm logging config for vllm_spyre so that all the usual logging configurations still work. For example, VLLM_LOGGING_LEVEL=WARNING will turn the level down to warning for the vllm_spyre logs too. This allows users to configure logging the same way as they do with other hardware backends.

I went ahead and ported some of the prints in the v1 codepath to use logging as well. This now logs the file and line number where the logs came from, so we no longer need to maintain manual prefixes like [SchedulerConfig] to denote where the log is from. An example snippet:

INFO 03-17 18:51:10 [spyre_worker.py:200] load model...
WARNING 03-17 18:51:10 [config.py:3600] Current VLLM config is not set.
INFO 03-17 18:51:10 [platform.py:114] VLLM_SPYRE_WARMUP_PROMPT_LENS = [64]
INFO 03-17 18:51:10 [platform.py:115] VLLM_SPYRE_WARMUP_NEW_TOKENS = [20]
INFO 03-17 18:51:10 [platform.py:116] VLLM_SPYRE_WARMUP_BATCH_SIZES = [1]
WARNING 03-17 18:51:10 [topk_topp_sampler.py:46] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
ignoring module=WordEmbedding when distributing module
INFO 03-17 18:51:10 [spyre.py:200] NOTICE: Adjusting torch._dynamo.config.cache_size_limit from 8 to 256 to accommodate prompt size of 64 and decode tokens of 20
INFO 03-17 18:51:10 [spyre_worker.py:215] load model took 0.237s
INFO 03-17 18:51:10 [kv_cache_utils.py:537] GPU KV cache size: 1,073,741,824 tokens
INFO 03-17 18:51:10 [kv_cache_utils.py:540] Maximum concurrency for 4,096 tokens per request: 262144.00x
INFO 03-17 18:51:10 [spyre_worker.py:53] Start warming up 1 different prompt/decode/batchsize-shape combinations.
INFO 03-17 18:51:10 [spyre_worker.py:68] Warmup 1/1 prompt/decode/batchsize-shape combinations...

Signed-off-by: Joe Runde <[email protected]>

github-actions · 2025-03-17T18:53:29Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes:

pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Joe Runde <[email protected]>

tdoublep

LGTM - thanks!

♻️ Use real logger

bbaf0d0

Signed-off-by: Joe Runde <[email protected]>

🔥 remove [SpyreWorker] prefixes

8bd1452

Signed-off-by: Joe Runde <[email protected]>

joerunde mentioned this pull request Mar 17, 2025

[V1] Cleanup kv-caching apis and disable paged attention #28

Merged

tdoublep approved these changes Mar 18, 2025

View reviewed changes

joerunde merged commit f81c4b9 into vllm-project:main Mar 18, 2025
9 checks passed

joerunde deleted the use-logger branch March 18, 2025 15:28

joerunde mentioned this pull request Mar 18, 2025

Use execute_model for warmup #26

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

♻️ Use real logger #31

♻️ Use real logger #31

Uh oh!

joerunde commented Mar 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 17, 2025

Uh oh!

tdoublep left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

♻️ Use real logger #31

♻️ Use real logger #31

Uh oh!

Conversation

joerunde commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 17, 2025

Uh oh!

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joerunde commented Mar 17, 2025 •

edited

Loading