Skip to content

Conversation

maxdebayser
Copy link
Collaborator

Description

The pooler is always run in in the transformers code even when the outputs aren't used. And in our case, we instantiate the pooler outside of the transformers code to use the vLLM code. In a small test that I'm using the total time goes from 1.8s to 1.2s,.

Signed-off-by: Max de Bayser <[email protected]>
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@maxdebayser maxdebayser merged commit 97ed5b0 into main Sep 25, 2025
19 checks passed
@maxdebayser maxdebayser deleted the optimize_pooler branch September 25, 2025 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant