Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect network flakiness before applying the Salt Formula #394

Open
smithfarm opened this issue Sep 12, 2020 · 1 comment
Open

Detect network flakiness before applying the Salt Formula #394

smithfarm opened this issue Sep 12, 2020 · 1 comment

Comments

@smithfarm
Copy link
Contributor

smithfarm commented Sep 12, 2020

ceph-salt apply is known to fail in odd ways when running in an environment with poor network connectivity. These failures can be especially vexxing if the network connections are flakey - i.e., they succeed on some attempts, and fail on others. In such cases, a user might reasonably think that the failure is due to a bug in ceph-salt.

For example:

  • when an external time server is configured, and connectivity with that external time server is flakey, ceph-salt apply can fail
  • when the container image path points to a remote registry, and connectivity with that registry is flakey, ceph-salt apply can fail.
  • when ceph-salt attempts to use zypper to install packages on nodes, and connectivity with remote zypper repos is flakey, ceph-salt apply can fail.

It would be nice if we could detect network flakiness before starting to apply the Salt Formula. The purpose of this ticket is to collect ideas for how to do that.

@smithfarm smithfarm changed the title Problem downloading ses7 container image with ceph-salt 15.2.11 ceph-salt apply fails in odd ways when network connectivity is bad Sep 12, 2020
@smithfarm smithfarm changed the title ceph-salt apply fails in odd ways when network connectivity is bad "ceph-salt apply" fails in odd ways when network connectivity is bad Sep 12, 2020
@smithfarm
Copy link
Contributor Author

smithfarm commented Sep 12, 2020

Idea: ping a remote server for a short time (e.g., 30 seconds) and measure packet loss.

CAVEAT: it is possible to configure ceph-salt in such a way that it does not initiate any communication with remote servers:

  • time server is local
  • container image path points to local registry
  • zypper repos are local

In such a case, it would be wrong to try to ping a remote server. So this test should first check how the environment is configured and only ping remote servers if "communication with remote servers" is detected in the configuration.

@smithfarm smithfarm changed the title "ceph-salt apply" fails in odd ways when network connectivity is bad Detect network flakiness before applying the Salt Formula Sep 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants