-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful shutdown #66
Open
jo-asplin-met-no
wants to merge
21
commits into
trunk
Choose a base branch
from
graceful_shutdown
base: trunk
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+154
−43
Open
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
8a43508
Added initial code for graceful shutdown (Issue #45)
jo-asplin-met-no ffabee8
Typo (a bit pedantic, but still)
jo-asplin-met-no cb0d6cc
Refactored shutdown_signal to common package
jo-asplin-met-no df49f79
Moved shutdown_signal from common to util
jo-asplin-met-no 8b01706
Removed superfluous signals
jo-asplin-met-no bb46d2f
Assumed Unix + added comments
jo-asplin-met-no 3eff24a
Gracefully shut down kvkafka reader
jo-asplin-met-no d8ca2b6
Improved signal catching and task joining
jo-asplin-met-no dfa7bfb
Detected shutdown signal during end-to-end testing
jo-asplin-met-no ad49595
Don't generally await/join signal catcher task
jo-asplin-met-no 6d050b4
Another go at cancelling kvkafka reader from caught signal
jo-asplin-met-no 239b7c1
Remove `zero_to_none` function
Lun4m dc1a4c8
Create timeseries if label does not exist instead of returning Err
Lun4m 7b46af6
Merge branch 'trunk' into graceful_shutdown
jo-asplin-met-no a63842e
Fixed lint issue
jo-asplin-met-no d4ef671
Fixed comment
jo-asplin-met-no eb9bd05
Used block to have attribute apply to multiple statements
jo-asplin-met-no ce4fee5
Fixed typos
jo-asplin-met-no a42a035
Yet another go at cancelling kafka reader, but essentially back to or…
jo-asplin-met-no decdd6b
Added debug printout
jo-asplin-met-no 0c21b6c
Yet another go at cancelling kafka reader
jo-asplin-met-no File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it looks good! Just wanted to mention that another option with cancellation tokens (which I've never used before by the way) is to skip the
select
altogether.I guess it's a matter of preference and it doesn't do exactly what the select code does (in particular, it doesn't play well with the
Err
branch of the match statement), but I find it easier to reason about.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I forgot that our kafka library is sync. In that case I think this is equivalent to the select, and probably the way to go. It has another problem though: In the pathological case where the kafka queue is empty when we cancel, This will hang indefinitely (or at least the 90 seconds until systemd sends
SIGKILL
...). It's also a problem that we're doing blocking IO on a non-blocking task.We can find a way around this, but I'm starting to think it would make more sense to switch to
rdkafka
which supports async.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why indefinitely? If the queue is empty I'd expect
consumer.poll()
to return an error (?), after which we wait 5 seconds and check if the token was cancelled before polling again.But maybe it's not so simple and I agree we should probably switch to
rdkafka
.This is the bug you mentioned in the other comment? That we are simply using
spawn
instead ofspawn_blocking
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the source and we're actually both wrong, it does block, but not indefinitely. The internal kafka client let's you set a
fetch_max_wait_time
, the default isn't listed in the docs, but it's in the source as 100ms. There's no error returned fromfetch_messages
so I assume it just returns an empty vector if it times out.100ms is short on a human scale so it's not a problem in terms of hanging indefinitely. It is an issue in terms of tokio though, because the guideline from tokio devs is that you shouldn't go more than 10-100us between await points, and this is 1000x more than that.
Nope. What I was talking about is that at the moment we commit offsets without waiting for the associated DB inserts to complete. Related to that the DB inserts being on a separate task means they aren't covered by the graceful shutdown, but a solution to the first problem with probably also solve the second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be covered indirectly by the sender being dropped? But oof. good catch, it probably makes more sense to parallelize over message sets instead of single messages.