Skip to content

Conversation

joerunde
Copy link
Collaborator

@joerunde joerunde commented Mar 17, 2025

I noticed that we weren't seeing some of the logger.warning messages that I had put into the v1 code, because we never initialized the logging package correctly for the plugin.

This PR reconfigures the logging package on import, and copies the vllm logging config for vllm_spyre so that all the usual logging configurations still work. For example, VLLM_LOGGING_LEVEL=WARNING will turn the level down to warning for the vllm_spyre logs too. This allows users to configure logging the same way as they do with other hardware backends.

I went ahead and ported some of the prints in the v1 codepath to use logging as well. This now logs the file and line number where the logs came from, so we no longer need to maintain manual prefixes like [SchedulerConfig] to denote where the log is from. An example snippet:

INFO 03-17 18:51:10 [spyre_worker.py:200] load model...
WARNING 03-17 18:51:10 [config.py:3600] Current VLLM config is not set.
INFO 03-17 18:51:10 [platform.py:114] VLLM_SPYRE_WARMUP_PROMPT_LENS = [64]
INFO 03-17 18:51:10 [platform.py:115] VLLM_SPYRE_WARMUP_NEW_TOKENS = [20]
INFO 03-17 18:51:10 [platform.py:116] VLLM_SPYRE_WARMUP_BATCH_SIZES = [1]
WARNING 03-17 18:51:10 [topk_topp_sampler.py:46] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
ignoring module=WordEmbedding when distributing module
INFO 03-17 18:51:10 [spyre.py:200] NOTICE: Adjusting torch._dynamo.config.cache_size_limit from 8 to 256 to accommodate prompt size of 64 and decode tokens of 20
INFO 03-17 18:51:10 [spyre_worker.py:215] load model took 0.237s
INFO 03-17 18:51:10 [kv_cache_utils.py:537] GPU KV cache size: 1,073,741,824 tokens
INFO 03-17 18:51:10 [kv_cache_utils.py:540] Maximum concurrency for 4,096 tokens per request: 262144.00x
INFO 03-17 18:51:10 [spyre_worker.py:53] Start warming up 1 different prompt/decode/batchsize-shape combinations.
INFO 03-17 18:51:10 [spyre_worker.py:68] Warmup 1/1 prompt/decode/batchsize-shape combinations...

Signed-off-by: Joe Runde <[email protected]>
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes:

pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Copy link
Member

@tdoublep tdoublep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@joerunde joerunde merged commit f81c4b9 into vllm-project:main Mar 18, 2025
9 checks passed
@joerunde joerunde deleted the use-logger branch March 18, 2025 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants