Skip to content

Conversation

Alexei-V-Ivanov-AMD
Copy link
Collaborator

Adds functionality to decide if a specific AMD test is blocking.

Implemented trough an additional "grade" property for each test definition in the test-pipeline.yaml. E.g.:

  • label: Core Test # 22min
    timeout_in_minutes: 35
    mirror_hardwares: [amdexperimental]
    grade: Blocking
    ....

    makes this "AMD: Core Test" blocking.

    Signed-off-by: Alexei V. Ivanov [email protected]

Signed-off-by: Alexei V. Ivanov <[email protected]>
 Signed-off-by: Alexei V. Ivanov <[email protected]>
Signed-off-by: Alexei V. Ivanov <[email protected]>
Signed-off-by: Alexei V. Ivanov <[email protected]>
Signed-off-by: Alexei V. Ivanov <[email protected]>
Signed-off-by: Alexei V. Ivanov <[email protected]>
Signed-off-by: Alexei V. Ivanov <[email protected]>
Signed-off-by: Alexei V. Ivanov <[email protected]>
Signed-off-by: Alexei V. Ivanov <[email protected]>
Signed-off-by: Alexei V. Ivanov <[email protected]>
Signed-off-by: Alexei V. Ivanov <[email protected]>
Signed-off-by: Alexei V. Ivanov <[email protected]>
Copy link
Collaborator

@khluu khluu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not add a field for this on test-pipeline.yaml?
It means whenever you need to switch the "grade" of any test, you'd need to make a PR on vLLM, which leads to many issues: more CI runs, inconsistent blocking state across builds on fork branch, etc..
I prefer to keep the logic to filter whether test is blocking or not on this repo

@Alexei-V-Ivanov-AMD
Copy link
Collaborator Author

@khluu

Can we not add a field for this on test-pipeline.yaml?

I don't really see how one can modify a parser for the test-pipeline.yaml, which generates the ultimate testing script for the CI after the test definitions in the test-pipeline.yaml, such that it gets the test-specific "grade" parameter from somewhere else.

Besides, keeping the test definitions, including the "grade" parameter completely in one place (test-pipeline.yaml) improves modularity of the code. Modularity, in its turn, contributes to the ease of further development and eventual re-usability.

@khluu
Copy link
Collaborator

khluu commented Sep 16, 2025

@khluu

Can we not add a field for this on test-pipeline.yaml?

I don't really see how one can modify a parser for the test-pipeline.yaml, which generates the ultimate testing script for the CI after the test definitions in the test-pipeline.yaml, such that it gets the test-specific "grade" parameter from somewhere else.

Can you just do something like this? https://github.com/vllm-project/ci-infra/blob/main/buildkite/test-template-ci.j2#L477?

Besides, keeping the test definitions, including the "grade" parameter completely in one place (test-pipeline.yaml) improves modularity of the code. Modularity, in its turn, contributes to the ease of further development and eventual re-usability.

This is valid, but given how often tests can become failing/flaky in the past/present, I think the it's better to keep it in here so we can intervene the CI pipeline immediately without changing code on the main repo.

@Alexei-V-Ivanov-AMD
Copy link
Collaborator Author

Alexei-V-Ivanov-AMD commented Sep 16, 2025

@khluu

Can you just do something like this? https://github.com/vllm-project/ci-infra/blob/main/buildkite/test-template-ci.j2#L477?

It will really create a cumbersome situation when a part of functionality hardcoded in one repo (ci-infra) will depend on the specific content of the file in another repo (test-pipeline.yaml).

The present implementation method of the line you're citing (L477) is cumbersome and suffers from the same drawbacks as mentioned above. The implementation of the line L477 was originally put in place as a "quick turn-around" way to experiment with different CI infrastructure mappings. But ultimately, we believe that the label with which we dispatch a given test to the specific CI agent pool also belongs to the test definition that is compactly captured in the file test-pipeline.yaml.

@khluu
Copy link
Collaborator

khluu commented Sep 17, 2025

@Alexei-V-Ivanov-AMD I don't think we can accommodate these changes to test-pipeline.yaml now, as:

  1. It's meant to define our CI tests, not for hardware tests
  2. Any future change to test-pipeline.yaml would trigger all tests in CI to run, which is costly for us.

I understand this is not convenient, but we are in the process of migrating to Github Actions and will redesign the separation & interface of our native CI tests and hardware tests better then so it's less cumbersome for you. For now, please keep any changes & mirroring logic for AMD tests in this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants