Skip to content

Conversation

OmCheeLin
Copy link
Contributor

@OmCheeLin OmCheeLin commented Jul 17, 2025

  • I have registered the PR changes.

Ⅰ. Describe what this PR did

  1. Automatically trigger detect-flaky-test.yml after the "build" fails and the "Rerun build" succeeds.

  2. Download the test reports from the first and second builds:

    • First build report: run-1-surefire-reports-${{ matrix.java }}
    • Second build report: run-2-surefire-reports-${{ matrix.java }}
  3. Run the Python script parse_failed_tests.py:

    • Compare the test reports from the first and second builds.
    • Identify tests that failed in the first run but passed in the second (i.e., flaky tests).
    • Output the results as a JSON list and pass them to the next steps.
  4. If flaky tests are found, automatically create an issue listing the unstable test names (format: ClassName.testMethod).

Ⅱ. Does this pull request fix one issue?

fixes #7448

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Copy link

codecov bot commented Jul 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.65%. Comparing base (d78267a) to head (3eb316c).
⚠️ Report is 8 commits behind head on 2.x.

Additional details and impacted files
@@             Coverage Diff              @@
##                2.x    #7545      +/-   ##
============================================
+ Coverage     60.63%   60.65%   +0.01%     
  Complexity      658      658              
============================================
  Files          1308     1308              
  Lines         49446    49446              
  Branches       5811     5811              
============================================
+ Hits          29983    29992       +9     
+ Misses        16801    16796       -5     
+ Partials       2662     2658       -4     

see 5 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@YongGoose
Copy link
Member

@OmCheeLin

Could you please explain how it works?
Also, it would be great if you could show an example using your own forked repository.
I'll give you feedback after reviewing the example.

@OmCheeLin
Copy link
Contributor Author

@OmCheeLin

Could you please explain how it works? Also, it would be great if you could show an example using your own forked repository. I'll give you feedback after reviewing the example.

ok, I will do it later, This pr is temporarily closed.

@OmCheeLin OmCheeLin closed this Jul 17, 2025
@OmCheeLin OmCheeLin reopened this Jul 17, 2025
@OmCheeLin
Copy link
Contributor Author

  1. To speed up the CI test process, I made a slight modification to build.yml so that it only runs tests under seata-common.
  2. I added a FlakyTest under seata-common, which will fail on the first build and succeed on the second.
  3. Note: I modified the workflow directly on the 2.x branch, because if it's changed on other branches, some workflow files won't use the latest version during actual CI runs.
  4. On the Actions page, after the build runs twice, detect-flaky-test will be triggered automatically.
image image image

@YongGoose this is my fork repo, see 2.x branch
https://github.com/OmCheeLin/incubator-seata

@OmCheeLin
Copy link
Contributor Author

image

click here to see file changes @YongGoose

@YongGoose
Copy link
Member

@OmCheeLin

It would be great if we could see a bit more information in the issue.

@YongGoose
Copy link
Member

In a workflow, what types of runs are retried when they fail?
Also, does the workflow automatically retry if it fails?

Additionally, I think it would be nice to have a label for the issue.
Would you be able to suggest a name for the label?
I’ll take care of creating it myself.

@YongGoose
Copy link
Member

Also, would it be possible to share this PR on DingTalk?
I believe this feature could be very useful, so it would be great to get feedback from more developers.

@OmCheeLin
Copy link
Contributor Author

@OmCheeLin

It would be great if we could see a bit more information in the issue.

Given flaky tests, I want to know how to find in which PRs these flaky tests occurred, using a web crawler?

@YongGoose
Copy link
Member

Given flaky tests, I want to know how to find in which PRs these flaky tests occurred, using a web crawler?

Instead of the PR, the URL of the action where the issue occurred would also be fine.
Would you be able to check what kind of information can be retrieved when creating an issue through github actions?

@OmCheeLin
Copy link
Contributor Author

Given flaky tests, I want to know how to find in which PRs these flaky tests occurred, using a web crawler?

Instead of the PR, the URL of the action where the issue occurred would also be fine. Would you be able to check what kind of information can be retrieved when creating an issue through github actions?

It parses the surefire-reports.xml file, currently with only class names.

@YongGoose
Copy link
Member

@OmCheeLin

To start with, it would be great if we could just output the class names.
We can consider upgrading the information provided through a separate PR later on.

For a smoother review process, it would also be helpful if you could clean up the code and resolve CI failures.

@OmCheeLin
Copy link
Contributor Author

@YongGoose cc

@OmCheeLin
Copy link
Contributor Author

I only changed changes.md, but the CI failed. The previous commit was still successful.
Is there flaky-tests?

@YongGoose
Copy link
Member

I only changed changes.md, but the CI failed. The previous commit was still successful.

Is there flaky-tests?

I rerun the test
Let's see

@YongGoose YongGoose requested a review from funky-eyes July 31, 2025 00:39
@YongGoose
Copy link
Member

@OmCheeLin

I’d appreciate it if you could create some sub-issues outlining the planned next steps after the PR gets merged.

@YongGoose YongGoose requested a review from Copilot August 6, 2025 03:07
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces automated flaky test detection to the CI/CD pipeline, similar to OpenSearch's approach. The system automatically triggers after a build fails initially but succeeds on rerun, identifying tests that exhibit flaky behavior.

  • Adds workflow to detect flaky tests by comparing test reports from failed and successful build attempts
  • Integrates test report uploading in the main build workflow to capture surefire/failsafe reports
  • Creates automated GitHub issues when flaky tests are identified

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
.github/workflows/detect-flaky-test.yml New workflow that downloads test reports from both build attempts and identifies flaky tests
.github/workflows/build.yml Modified to upload test reports as artifacts for flaky test analysis
.github/scripts/parse_failed_tests.py Python script to parse XML test reports and identify tests that failed in first run but passed in second
changes/en-us/2.x.md Added changelog entry for the flaky test detection feature
changes/zh-cn/2.x.md Added Chinese changelog entry for the flaky test detection feature
Comments suppressed due to low confidence (3)

.github/workflows/detect-flaky-test.yml:71

  • The actions/setup-python@v2 action is deprecated. Use actions/setup-python@v4 or later for better security and performance.
        uses: actions/setup-python@v2

.github/workflows/detect-flaky-test.yml:82

  • The actions/github-script@v6 action is outdated. Use actions/github-script@v7 or later for improved functionality and security.
        uses: actions/github-script@v6

.github/workflows/detect-flaky-test.yml:69

  • The environment variable I_RUN_ATTEMPT is set but never used in this workflow. This appears to be copied from the build workflow but serves no purpose here.
      # step 3


flaky_tests = []
for test_id, status_1 in results_1.items():
status_2 = results_2.get(test_id, "passed")
Copy link

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming a test is 'passed' when it's not found in the second run may be incorrect. A missing test could indicate it was skipped or not executed, which should be handled differently than a passed test.

Suggested change
status_2 = results_2.get(test_id, "passed")
if test_id not in results_2:
# Test missing in second run; cannot determine if flaky, skip
continue
status_2 = results_2[test_id]

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce automated flaky test tracking like OpenSearch

2 participants