-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watch pods times out/stops recognizes new notices after ~30 mins #512
Comments
https://github.com/yangl900/knet is very interesting, I need to absorb the info there... Note that keeping an open connection for many minutes with no activity is likely to cause a "version too old" if it does get disconnected. Consider also adding kubeclient support to opt-in to watch bookmarks which reduce this issue: |
We're seeing this as well. When the actual connection gets closed, the watch command exits with an error (as documented) which we catch and use to restart the listener. But there is something that happens where the listener stops getting notices, but doesn't actually exit, leaving the process more or less stuck. We're going to add a timer-based kill on our side as a workaround but it seems like kubeclient has a condition it needs to account for. |
To configure the keep alive timeout, instanciate KubeClient::Client with the `:keep_alive_timeout` keyword argument, defaulting to 60 (seconds). Fixes issue ManageIQ#512.
client.watch_pods times out/stops recognizing new notices if there is no watch event that occurs for about 30 minutes even with timeouts set (open=60, read=nil) when creating the kubeclient.
The watch_pods is wrapped in an infinite loop, so if it simply exits or raises an exception, there is a mechanism in place to sleep and then restart the watch. However, we currently don't break out of watch_pods and instead time out and not act on any new notices received.
Might need to set the tcp keep alive (keep_alive_timeout) in the http_options in kubeclient.rb to avoid this. Need to replicate the behavior provided in client-go as specified here: https://github.com/yangl900/knet
"The k8s client-go by default turns on TCP Keepalive, and the client side will send an ACK packet to API server every 30s. With this, even though the SLB default timeout is 4 minutes, the TCP connection will never be idle and so will never be reset."
Any other ideas as to why this could be occurring?
The text was updated successfully, but these errors were encountered: