-
Notifications
You must be signed in to change notification settings - Fork 45
[nexus] Create a longer-lived connection from Nexus -> Sled Agent #8149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -90,7 +90,20 @@ impl super::Nexus { | |
.db_datastore | ||
.zpool_get_sled_if_in_service(&opctx, bundle.zpool_id.into()) | ||
.await?; | ||
let client = self.sled_client(&sled_id).await?; | ||
|
||
let short_timeout = std::time::Duration::from_secs(60); | ||
let long_timeout = std::time::Duration::from_secs(3600); | ||
let client = nexus_networking::default_reqwest_client_builder() | ||
// Continuing to read from the sled agent should happen relatively | ||
// quickly. | ||
.read_timeout(short_timeout) | ||
// However, the bundle itself may be large. As long as we're | ||
// continuing to make progress (see: read_timeout) we should be | ||
// willing to keep transferring the bundle for a while longer. | ||
.timeout(long_timeout) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This may still be too short for a truly large support bundle, say 50 GiB. At a download speed of 10 MiB/s, that would take roughly 85 minutes to complete. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I hear you - I think range requests is how I want to solve this in the limit; I'm not sure I really want to allow connections to stay open in Nexus for longer than an hour. WDYT - should we still try to do this PR as-is? Do you think we should aim for a higher number? Or should we accept this limitation until range requests are fully plumbed through? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My initial concern was a customer being completely blocked by this timeout, but we could extract individual files via the API pretty easily, so that's less of an issue. An hour seems like a reasonable limit until we can get range requests in. |
||
.build() | ||
.expect("Failed to build reqwest Client"); | ||
let client = self.sled_client_ext(&sled_id, client).await?; | ||
|
||
// TODO: Use "range"? | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be in the config file or something? It seems like it could, potentially, be useful to change them in the field if we see the long timeout hit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to this, but I only wonder: is plumbing them into nexus config enough to make this modifiable? Seems like callers would still need to change the config among all nexuses, which would either require storing this value in the DB or deploying a new configuration via reconfigurator, right?