Skip to content

Conversation

@nadavleva
Copy link

@nadavleva nadavleva commented Oct 12, 2025

This PR updates the VolumeReplicationGroupRecipe Controller test to wait for VRG deletion to complete, addressing flakiness observed in the test.

Fixes #2294

Copy link
Member

@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nadavleva thanks for contributing!

vrgDelete := func() {
Expect(k8sClient.Delete(ctx, vrg)).To(Succeed())
Eventually(vrgGet).Should(MatchError(k8serrors.NewNotFound(
Eventually(vrgGet, time.Second*10, time.Millisecond*100).Should(MatchError(k8serrors.NewNotFound(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may work but it is not clear based on the the function signature:

Eventually(actualOrCtx any, args ...any) AsyncAssertion

We should use:

Eventually(vrgGet).WithTimeout(10*time.Second).Should(MatchError(k8serrors.NewNotFound(

Also it will help to add a comment explaining why we need larger timeout for this test.

I would not bother with the polling interval to keep the code simpler.

Can you share tests results with this timeout, proving that the flakiness is caused by short timeout?

A good example would be to run this test 100 times with this timeout and measure how much time we waited (min, max, avg).

We have lot of falky tests, I wonder this delete timeout within 1 second is not causing other tests to fail. We can increate the default timeout to test this:
https://pkg.go.dev/github.com/onsi/gomega#SetDefaultEventuallyTimeout

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will run some tests to verify the behavior and have more info on the deletion time interval.
I encountered additional tests that are flaky due to timeout, e.g. #2235

@nirs nirs changed the title Fix flaky VRG test by waiting for deletion to complete (issue #2294) Fix flaky VRG test by waiting for deletion to complete Oct 12, 2025
@nirs
Copy link
Member

nirs commented Oct 12, 2025

@nadavleva I modified the title and commit message to be consistent with our standards:

  • No need to mention the issue in the PR title
  • To link the PR with the issue, add Fixes #NNN comment at the end of the PR message.

@nirs
Copy link
Member

nirs commented Oct 13, 2025

First run was good with this change but this this is a flaky test one run does not mean much. I will trigger more runs in the next days.

The best way to validated this is to run only this test locally with high count number and and better logging showing how much time we waited for deletion.

@nadavleva
Copy link
Author

nadavleva commented Oct 13, 2025

Hey @nirs
It seems that a different test fails at drplacementcontrol_controller_test.go:1420
I encounter some flaky tests that occasionally fail/pass or need several reruns (in GitHub Action) to pass

@nirs
Copy link
Member

nirs commented Oct 14, 2025

Hey @nirs It seems that a different test fails at drplacementcontrol_controller_test.go:1420

This is fine, we cannot fix all flaky tests at once. The good thing is that the tests you change did not fail, but of course 2 runs is not enough for testing flakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky Test VolumeReplicationGroupRecipe Controller test

2 participants