Amd blocking test collection #165

Alexei-V-Ivanov-AMD · 2025-09-15T23:21:20Z

Adds functionality to decide if a specific AMD test is blocking.

Implemented trough an additional "grade" property for each test definition in the test-pipeline.yaml. E.g.:

label: Core Test # 22min
timeout_in_minutes: 35
mirror_hardwares: [amdexperimental]
grade: Blocking
....

makes this "AMD: Core Test" blocking.

Signed-off-by: Alexei V. Ivanov [email protected]

Signed-off-by: Alexei V. Ivanov <[email protected]>

khluu

Can we not add a field for this on test-pipeline.yaml?
It means whenever you need to switch the "grade" of any test, you'd need to make a PR on vLLM, which leads to many issues: more CI runs, inconsistent blocking state across builds on fork branch, etc..
I prefer to keep the logic to filter whether test is blocking or not on this repo

Alexei-V-Ivanov-AMD · 2025-09-16T21:29:17Z

@khluu

Can we not add a field for this on test-pipeline.yaml?

I don't really see how one can modify a parser for the test-pipeline.yaml, which generates the ultimate testing script for the CI after the test definitions in the test-pipeline.yaml, such that it gets the test-specific "grade" parameter from somewhere else.

Besides, keeping the test definitions, including the "grade" parameter completely in one place (test-pipeline.yaml) improves modularity of the code. Modularity, in its turn, contributes to the ease of further development and eventual re-usability.

khluu · 2025-09-16T21:41:50Z

@khluu

Can we not add a field for this on test-pipeline.yaml?

I don't really see how one can modify a parser for the test-pipeline.yaml, which generates the ultimate testing script for the CI after the test definitions in the test-pipeline.yaml, such that it gets the test-specific "grade" parameter from somewhere else.

Can you just do something like this? https://github.com/vllm-project/ci-infra/blob/main/buildkite/test-template-ci.j2#L477?

Besides, keeping the test definitions, including the "grade" parameter completely in one place (test-pipeline.yaml) improves modularity of the code. Modularity, in its turn, contributes to the ease of further development and eventual re-usability.

This is valid, but given how often tests can become failing/flaky in the past/present, I think the it's better to keep it in here so we can intervene the CI pipeline immediately without changing code on the main repo.

Alexei-V-Ivanov-AMD · 2025-09-16T21:54:56Z

@khluu

Can you just do something like this? https://github.com/vllm-project/ci-infra/blob/main/buildkite/test-template-ci.j2#L477?

It will really create a cumbersome situation when a part of functionality hardcoded in one repo (ci-infra) will depend on the specific content of the file in another repo (test-pipeline.yaml).

The present implementation method of the line you're citing (L477) is cumbersome and suffers from the same drawbacks as mentioned above. The implementation of the line L477 was originally put in place as a "quick turn-around" way to experiment with different CI infrastructure mappings. But ultimately, we believe that the label with which we dispatch a given test to the specific CI agent pool also belongs to the test definition that is compactly captured in the file test-pipeline.yaml.

khluu · 2025-09-17T21:56:09Z

@Alexei-V-Ivanov-AMD I don't think we can accommodate these changes to test-pipeline.yaml now, as:

It's meant to define our CI tests, not for hardware tests
Any future change to test-pipeline.yaml would trigger all tests in CI to run, which is costly for us.

I understand this is not convenient, but we are in the process of migrating to Github Actions and will redesign the separation & interface of our native CI tests and hardware tests better then so it's less cumbersome for you. For now, please keep any changes & mirroring logic for AMD tests in this repo.

Alexei-V-Ivanov-AMD added 12 commits September 10, 2025 12:10

Update test-template-ci.j2

111956f

Signed-off-by: Alexei V. Ivanov <[email protected]>

Update test-template-ci.j2

f829f33

Signed-off-by: Alexei V. Ivanov <[email protected]>

Update test-template-ci.j2

189dae6

Signed-off-by: Alexei V. Ivanov <[email protected]>

Update test-template-ci.j2

cac2c54

Signed-off-by: Alexei V. Ivanov <[email protected]>

Update test-template-ci.j2

bd828a7

Signed-off-by: Alexei V. Ivanov <[email protected]>

Update test-template-ci.j2

8b56912

Signed-off-by: Alexei V. Ivanov <[email protected]>

Update test-template-ci.j2

9c80fbe

Signed-off-by: Alexei V. Ivanov <[email protected]>

Update test-template-ci.j2

c5aac66

Signed-off-by: Alexei V. Ivanov <[email protected]>

Update test-template-ci.j2

b393488

Signed-off-by: Alexei V. Ivanov <[email protected]>

Update test-template-ci.j2

ae30511

Signed-off-by: Alexei V. Ivanov <[email protected]>

Update test-template-ci.j2

04adae1

Signed-off-by: Alexei V. Ivanov <[email protected]>

Update test-template-ci.j2

78cba3c

Signed-off-by: Alexei V. Ivanov <[email protected]>

khluu requested changes Sep 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Amd blocking test collection #165

Amd blocking test collection #165

Uh oh!

Alexei-V-Ivanov-AMD commented Sep 15, 2025

Uh oh!

khluu left a comment •

edited

Loading

Uh oh!

Alexei-V-Ivanov-AMD commented Sep 16, 2025

Uh oh!

khluu commented Sep 16, 2025 •

edited

Loading

Uh oh!

Alexei-V-Ivanov-AMD commented Sep 16, 2025 •

edited

Loading

Uh oh!

khluu commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Amd blocking test collection #165

Are you sure you want to change the base?

Amd blocking test collection #165

Uh oh!

Conversation

Alexei-V-Ivanov-AMD commented Sep 15, 2025

Uh oh!

khluu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Alexei-V-Ivanov-AMD commented Sep 16, 2025

Uh oh!

khluu commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alexei-V-Ivanov-AMD commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

khluu commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

khluu left a comment •

edited

Loading

khluu commented Sep 16, 2025 •

edited

Loading

Alexei-V-Ivanov-AMD commented Sep 16, 2025 •

edited

Loading