Skip to content

roachtest: adding defensive code in ceph/reef test #148756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2025

Conversation

sravotto
Copy link
Contributor

@sravotto sravotto commented Jun 24, 2025

We have seen sporadic failures in the ceph tests, due to failures in creating users in the ceph object gateway.

To address this we are adding code to check that the gateway is up by submitting a read only request, before attempting to add the user.

Epic: none

Fixes: #148731

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@sravotto sravotto added backport-24.3.x Flags PRs that need to be backported to 24.3 backport-25.1.x Flags PRs that need to be backported to 25.1 backport-25.2.x Flags PRs that need to be backported to 25.2 labels Jun 24, 2025
@sravotto sravotto marked this pull request as ready for review June 24, 2025 20:29
@sravotto sravotto requested review from a team, sumeerbhola, jeffswenson, glennfawcett and BramGruneir and removed request for a team, sumeerbhola and glennfawcett June 24, 2025 20:29
Copy link
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

With some minor tweaks.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jeffswenson)


pkg/cmd/roachtest/tests/s3_microceph.go line 178 at r1 (raw file):

	cmd := `sudo radosgw-admin user list`
	var err error
	for i := 0; i < 10; i++ {

Do we have a standard retry loop that we use with other tests? There must be some prior art here. If not, this is good enough.


pkg/cmd/roachtest/tests/s3_microceph.go line 180 at r1 (raw file):

	for i := 0; i < 10; i++ {
		// Sleep for few seconds, then try the command.
		time.Sleep(2 * time.Second)

Why not put the sleep after the command fails?
It might be worth writing the err to a log as an info with a bit of context.

We have seen sporadic failures in the ceph tests, due to failures
in creating users in the ceph object gateway.

To address this we are adding code to check that the gateway is
up by submitting a read only request, before attempting to
add the user.

Epic: none

Fixes: cockroachdb#148731

Release note: None
Copy link
Contributor Author

@sravotto sravotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated based on comments.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @BramGruneir and @jeffswenson)


pkg/cmd/roachtest/tests/s3_microceph.go line 178 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Do we have a standard retry loop that we use with other tests? There must be some prior art here. If not, this is good enough.

Done


pkg/cmd/roachtest/tests/s3_microceph.go line 180 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Why not put the sleep after the command fails?
It might be worth writing the err to a log as an info with a bit of context.

Reusing RunE retry loop.

Copy link
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jeffswenson)

@sravotto
Copy link
Contributor Author

TFTR

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 26, 2025

@craig craig bot merged commit 553aa73 into cockroachdb:master Jun 26, 2025
22 checks passed
Copy link

blathers-crl bot commented Jun 26, 2025

Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches.


Issue #148731: branch-release-24.3, branch-release-25.1, branch-release-25.2.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.3.x Flags PRs that need to be backported to 24.3 backport-25.1.x Flags PRs that need to be backported to 25.1 backport-25.2.x Flags PRs that need to be backported to 25.2 target-release-25.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: backup/ceph/reef failed
3 participants