-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeKiller seems to be not working in 100 node 1.17 / master performance tests #1005
Comments
Looks like it transiently fails in 1.16, meaning that some of the ssh calls succeed and some not (within a single run), e.g. OK - BAD - |
I checked 3 runs of 1.17 test and the problem doesn't occur there. Seems to be 1.16 specific thing. Maybe there is a different gcloud version used in 1.16 and 1.17? |
I'd try upgrading the gcloud version in 1.16 test to see whether it helps |
/assign |
kubernetes/test-infra#16103 doesn't seem to be helping, let's revert it. I took a deeper look a have a new theory now. It looks like in 1.17 runs there are no logs from chaosmonkey components. I believe that the error we see in 1.16 are actually expected, they are returned for reboot command which terminates the ssh connection. We don't see them in 1.17 because chaosmonkey doesn't work properly there for some reason. The thing that stands out is that in 1.17 we have this commit and we don't have it in 1.16. I'd suggest adding more logging to nodes.go in master branch to see what is going on with the chasomonkey there. |
/good-first-issue |
@mm4tt: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
FTR, these are the chaosmonkey files that we could instrument better - https://github.com/kubernetes/perf-tests/tree/eb4fffb50d3caee11a57262b46286f051d9337fb/clusterloader2/pkg/chaos |
This reverts commit f00041e. It didn't help, see kubernetes/perf-tests#1005
/assign |
There are two different issues:
|
Fixed. |
/close |
@jprzychodzen: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@mm4tt: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Original debugging done by @jkaniuk:
The text was updated successfully, but these errors were encountered: