Skip to content

Conversation

@prashantgupta24
Copy link
Collaborator

@prashantgupta24 prashantgupta24 commented Aug 19, 2025

Description

For model ibm-ai-platform/micro-g3.3-8b-instruct-1b:
max_num_seqs 2 with TP 4 and CB for test_openai_serving seems to be failing:

numValidElems: 139984597168000 larger than attn mask vector size: 256dataformat_src_: IEEE_INT64

Some notes:

Earlier we had only BS 2 tested with:

  1. test_openai_serving which tested with SB and TP 1, 2, 4, and
  2. test_openai_serving_cb which tested against CB but with no TP.

Both of them separately passed, but TP (4) with CB and BS 2 for test_spyre_online is failing (but TP (2) with CB and BS 2 passes)

Also the full model passes:

tests/e2e/test_spyre_online.py::test_openai_serving[max_model_len(256)-max_num_seqs(2)-cb-sendnn-TP(4)-ibm-granite/granite-3.3-8b-instruct] PASSED

@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@prashantgupta24 prashantgupta24 enabled auto-merge (squash) August 19, 2025 15:52
@github-actions github-actions bot added the ready label Aug 19, 2025
Copy link
Collaborator

@joerunde joerunde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F

@prashantgupta24 prashantgupta24 merged commit 470a049 into main Aug 19, 2025
23 of 31 checks passed
@prashantgupta24 prashantgupta24 deleted the max-seqs-4 branch August 19, 2025 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants