You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yesterday, the Hetzner Cloud API had an outage, and it appears that the docker machine driver did not handle it well.
You can see that from 2024-11-13 17:00:00 to 2024-11-14 08:00:00, the amount of requests to /server_types, /images and /locations is unexpectedly high. Also, the amount of requests for single action was also really high.
This leads into rate limits, while waiting for servers to be created.
Maybe cache the call the /locations, /server types and /images, those shouldn't change that often. Unless you are checking for a server type availability ?
The text was updated successfully, but these errors were encountered:
Bad jokes aside, sorry this caused you headaches. I'll have a look to get the exponential back-off implemented soon. Regarding error handling in general, I am somewhat torn as to what the best approach is. We do have explicit retry with a set timeout, which was implemented as a feature request. The default behaviour is to fail-fast, as it always was, but it could be changed in a major version bump. When using the CLI this would be what I expect, but I do see the issue with some docker-machine RPC talking applications, such as Rancher, going for a request-storm in fail-fast mode.
As for the caching, I do get the point of them being stable. However, I cannot really be sure in which environment the driver is running. Granted, vanilla docker-machine would be useless without a writeable home directory. But given its PRC nature, it could be run with any kinds of restrictions, so long one takes care it can access provided SSH key files.
Yesterday, the Hetzner Cloud API had an outage, and it appears that the docker machine driver did not handle it well.
You can see that from 2024-11-13 17:00:00 to 2024-11-14 08:00:00, the amount of requests to
/server_types
,/images
and/locations
is unexpectedly high. Also, the amount of requests for single action was also really high.This leads into rate limits, while waiting for servers to be created.
I see a few possible improvements:
Watch*
API is deprecated).The text was updated successfully, but these errors were encountered: