-
Notifications
You must be signed in to change notification settings - Fork 8.9k
optimize: Introduce automated flaky test tracking like OpenSearch #7545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.x
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 2.x #7545 +/- ##
============================================
+ Coverage 60.63% 60.65% +0.01%
Complexity 658 658
============================================
Files 1308 1308
Lines 49446 49446
Branches 5811 5811
============================================
+ Hits 29983 29992 +9
+ Misses 16801 16796 -5
+ Partials 2662 2658 -4 🚀 New features to boost your workflow:
|
Could you please explain how it works? |
ok, I will do it later, This pr is temporarily closed. |
![]() ![]() ![]() @YongGoose this is my fork repo, see 2.x branch |
![]() click here to see file changes @YongGoose |
It would be great if we could see a bit more information in the issue. |
In a workflow, what types of runs are retried when they fail? Additionally, I think it would be nice to have a label for the issue. |
Also, would it be possible to share this PR on DingTalk? |
Given flaky tests, I want to know how to find in which PRs these flaky tests occurred, using a web crawler? |
Instead of the PR, the URL of the action where the issue occurred would also be fine. |
It parses the surefire-reports.xml file, currently with only class names. |
To start with, it would be great if we could just output the class names. For a smoother review process, it would also be helpful if you could clean up the code and resolve CI failures. |
@YongGoose cc |
I only changed changes.md, but the CI failed. The previous commit was still successful. |
I rerun the test |
I’d appreciate it if you could create some sub-issues outlining the planned next steps after the PR gets merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces automated flaky test detection to the CI/CD pipeline, similar to OpenSearch's approach. The system automatically triggers after a build fails initially but succeeds on rerun, identifying tests that exhibit flaky behavior.
- Adds workflow to detect flaky tests by comparing test reports from failed and successful build attempts
- Integrates test report uploading in the main build workflow to capture surefire/failsafe reports
- Creates automated GitHub issues when flaky tests are identified
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
.github/workflows/detect-flaky-test.yml |
New workflow that downloads test reports from both build attempts and identifies flaky tests |
.github/workflows/build.yml |
Modified to upload test reports as artifacts for flaky test analysis |
.github/scripts/parse_failed_tests.py |
Python script to parse XML test reports and identify tests that failed in first run but passed in second |
changes/en-us/2.x.md |
Added changelog entry for the flaky test detection feature |
changes/zh-cn/2.x.md |
Added Chinese changelog entry for the flaky test detection feature |
Comments suppressed due to low confidence (3)
.github/workflows/detect-flaky-test.yml:71
- The actions/setup-python@v2 action is deprecated. Use actions/setup-python@v4 or later for better security and performance.
uses: actions/setup-python@v2
.github/workflows/detect-flaky-test.yml:82
- The actions/github-script@v6 action is outdated. Use actions/github-script@v7 or later for improved functionality and security.
uses: actions/github-script@v6
.github/workflows/detect-flaky-test.yml:69
- The environment variable I_RUN_ATTEMPT is set but never used in this workflow. This appears to be copied from the build workflow but serves no purpose here.
# step 3
|
||
flaky_tests = [] | ||
for test_id, status_1 in results_1.items(): | ||
status_2 = results_2.get(test_id, "passed") |
Copilot
AI
Aug 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming a test is 'passed' when it's not found in the second run may be incorrect. A missing test could indicate it was skipped or not executed, which should be handled differently than a passed test.
status_2 = results_2.get(test_id, "passed") | |
if test_id not in results_2: | |
# Test missing in second run; cannot determine if flaky, skip | |
continue | |
status_2 = results_2[test_id] |
Copilot uses AI. Check for mistakes.
Ⅰ. Describe what this PR did
Automatically trigger
detect-flaky-test.yml
after the "build" fails and the "Rerun build" succeeds.Download the test reports from the first and second builds:
run-1-surefire-reports-${{ matrix.java }}
run-2-surefire-reports-${{ matrix.java }}
Run the Python script
parse_failed_tests.py
:If flaky tests are found, automatically create an issue listing the unstable test names (format:
ClassName.testMethod
).Ⅱ. Does this pull request fix one issue?
fixes #7448
Ⅲ. Why don't you add test cases (unit test/integration test)?
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews