-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
watching stops without any notification #273
Comments
@grosser when you say "just stops" do you mean Which kubernetes version are you connecting to? And which ruby you're running? don't know if related but I run |
idk if the block stops or it just idles ... I'll try to find out ... if
that's the problem then it's easy to fix :D
…On Thu, Nov 2, 2017 at 3:35 PM, Beni Cherniavsky-Paskin < ***@***.***> wrote:
@grosser <https://github.com/grosser> when you say "just stops" do you
mean .each { block... } returns at some point?
or keeps running but never calls the block?
Which kubernetes version are you connecting to? And which ruby you're
running?
don't know if related but I run oc get pod --watch yesterday and noticed
it just exited after some time.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#273 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAsZ4AOcWzGSn82WkeHSt3-l4JrUOXVks5sykOngaJpZM4QQWhG>
.
|
confirmed that the block just stops ... so having a reconnect by default or optional would be nice ... atm I'm just doing |
Actually I am not sure we want to add this in the lib. How would you know that there was a disconnection? (Maybe some message was lost, etc.). I would keep this as is. |
then we should call that out in the readme ... "watcher will stop when it
is disconnected, run it in a loop to make sure it stays running" or so
…On Fri, Nov 3, 2017 at 5:17 AM, Federico Simoncelli < ***@***.***> wrote:
Actually I am not sure we want to add this in the lib. How would you know
that there was a disconnection? (Maybe some message was lost, etc.). I
would keep this as is.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#273 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAsZwUVxUKbjxfeiSU_FDqn_PFVDLOyks5sywRSgaJpZM4QQWhG>
.
|
Definitely 👍 |
I wrote a service which watches namespaces, for the purpose of emitting a continuous stream of events so a person (the next day) can see when his pod restarted, and for what reason, my complaint here is that yes, my watch loop is in an infinite loop, but when the loop starts again, you end up getting the same data you previously emitted because they overlap, this stinks, to work around it, I'm generating a hash and drop the "already read" events. Anyone have a better idea? |
loop do
ignore_until = Time.now.to_f + 0.5 # re-processing happens in the first 0.4s
kuber_client.watch_endpoints.each do |e|
next if Time.now.to_f < ignore_until
puts e
end
end
|
I've seen this happening with curl after little over an hour in kubernetes v1.7.6+a08f5eeb62
This suggests this is caused by a k8s issue. |
Thanks @moolitayer, we're okay because if the thread exits it will be restarted and currently we don't process the initial list of pods. This is a little concerning when we move to using watches as the primary mechanism for refresh because then we will be processing the entire initial set every hour. Basically the same scheduled full refresh we do today 😟 |
Hey, The logs are below, but long story short, the latest resource is The only way to get a watcher started then is to use 0, which returns you ALL the objects. You'll then need to filter the returned objects to be >= 44924022. It's quite shit really, as you're potentially returning a lot of objects from the k8s api, especially when the connection times out so frequently (seemingly every 90seconds or so for CRDs in particular)
|
Does anyone know if |
reading kubernetes/kubernetes#55230 it looks like you can specify a |
Does anyone know if the resourceVersion ever rolls back over. At the moment quite a bit of my de-dupe code is basically If that value didn't just continually increment then eventually, my app will break |
The answers in this discussion: kubernetes/website#6540 (comment) explain some things well.
By default shared for many collections, but might be separate, for example Events have independent version by default. Do not assume any relation between versions of different collections.
So this presently works AFAIK but k8s devs are very insistent that versions are opaque strings, which might not even be integers in future, and clients shouldn't compare them for order, only equality :-(
You can also do a list request (
|
Nah, my bad, that would be too convenient to be true 😉 API docs say:
So there is slight difference between 0 and omitted but both give you old data... |
FYI code I'm currently using to have cheap restarts and no re-plays on restart: # frozen_string_literal: true
module Watching
def initialize(*)
super
@started = latest_resource_version
end
private
def watch_resource(type)
loop do
get_json(type, watch: true, resource_version: @started) do |notice|
if notice[:type] == "ERROR" && notice.dig(:object, :code) == 410 # version was too old, watch will stop
@started = latest_resource_version
else
@started = notice.dig(:object, :metadata, :resourceVersion)
yield notice
end
end
end
end
# get something cheap to get latest resourceVersion from the List object that is returned
def latest_resource_version
get_json(:namespaces, limit: 1, raw: true).dig(:metadata, :resourceVersion)
end
end |
Is there no way to accomplish this "persistent" connection without using a loop? 😞 |
@grosser isn't there a race condition here when the latest resourceVersion changes in between and therefore you miss updates? |
yes, but that would happen very rarely (when things broken down) |
Already saw multiple times that the watcher just stops ... without crashing / notifying ...
Idk how to reproduce that, but it happens regularly ... and it does not happen for kube-proxy so either there is a bug in this library or kube-proxy go-lang code has some smart disconnect handling
atm using below and calling .restart every x minutes
idk how to fix/debug this further but wanted to raise awareness.
The text was updated successfully, but these errors were encountered: