Skip to content

Conversation

AnishPahilajani
Copy link

@AnishPahilajani AnishPahilajani commented Sep 26, 2025

Description

This PR ensures that threading-related environment variables (e.g. OPENBLAS_NUM_THREADS, MKL_NUM_THREADS, etc.) are only set if they are not already defined by the user.

Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@yannicks1
Copy link
Collaborator

hey @AnishPahilajani please sign off your commit correctly and run our linter (I assume it will fail as I see a double space after the if)


for env in THREADING_ENVS:
os.environ[env] = str(cpus_per_worker)
if not os.environ.get(env):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double space?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@tjohnson31415
Copy link
Collaborator

tjohnson31415 commented Sep 26, 2025

The VLLM_SPYRE_UPDATE_THREAD_CONFIG feature flag configures whether or not any of the threading env vars are overridden. We have it on by default, but you can set VLLM_SPYRE_UPDATE_THREAD_CONFIG=0 to tell it not to override anything.

(We actually have a case where a script in an image sets OMP_NUM_THREADS automatically, but it can't take the tensor-parallelism of vLLM into account. So I'd prefer to just have the feature flag instead of the feature flag + override if not set)

@AnishPahilajani
Copy link
Author

AnishPahilajani commented Sep 26, 2025

I understand the idea behind using only the VLLM_SPYRE_UPDATE_THREAD_CONFIG flag to control whether threading environment variables are overridden.

On our Power 11 systems, some threading env vars (like OMP_NUM_THREADS) are set by scripts in the image, while others are unset. If we override all vars unconditionally, it could break existing configurations. That’s why I added the conditional check (if not os.environ.get(env):) it ensures we only set vars that aren’t already defined, while still respecting the feature flag.

I also expect that scenario (script setting OMP_NUM_THREADS) to still work correctly.

@tjohnson31415
Copy link
Collaborator

tjohnson31415 commented Sep 26, 2025

On the other hand, the "existing configuration" can be sub-optimal for or break vllm-spyre. What if you want to override the system script's OMP_NUM_THREADS setting based on vLLM's configuration?
If VLLM_SPYRE_UPDATE_THREAD_CONFIG=1 only sets unset env vars, it is not possible to have vllm-spyre override the system settings (which I presume are static and not based on the launched worker count that vLLM knows about).

My thought with VLLM_SPYRE_UPDATE_THREAD_CONFIG is that, if you know what you are doing with threading and want to control it, you'd have full control, including optimizing for the specific vLLM worker configuration you are using.

tjohnson31415 added a commit that referenced this pull request Oct 3, 2025
# Description

A couple of improvements related to setting threading based on cpu
count:
- add `VLLM_SPYRE_NUM_CPUS` configuration to set the cpu count and skip
the detection steps
- this is useful if detection is not working but you want just 1
configuration to set and to still have vllm-spyre scale by the worker
count
- adds `psutil` as another way to auto-detect CPUs by counting
"physical" cores only instead of logical cores
- in multi-threaded CPU bottlenecks, using logical cores may be
inefficient
- cpu detection is meant to be best-effort so I didn't add psutil as a
dependency (though it does currently come in through the accelerate
sub-dependency of the `fp8` extras package)

## Related Issues

#483

---------

Signed-off-by: Travis Johnson <[email protected]>
@joerunde
Copy link
Collaborator

joerunde commented Oct 6, 2025

closing as we have an alternative implemented in #487

@joerunde joerunde closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants