Skip to content

[nexus] Create a longer-lived connection from Nexus -> Sled Agent #8149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 16, 2025

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented May 13, 2025

Fixes #8144

// However, the bundle itself may be large. As long as we're
// continuing to make progress (see: read_timeout) we should be
// willing to keep transferring the bundle for a while longer.
.timeout(long_timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may still be too short for a truly large support bundle, say 50 GiB. At a download speed of 10 MiB/s, that would take roughly 85 minutes to complete.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you - I think range requests is how I want to solve this in the limit; I'm not sure I really want to allow connections to stay open in Nexus for longer than an hour.

WDYT - should we still try to do this PR as-is? Do you think we should aim for a higher number? Or should we accept this limitation until range requests are fully plumbed through?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial concern was a customer being completely blocked by this timeout, but we could extract individual files via the API pretty easily, so that's less of an issue. An hour seems like a reasonable limit until we can get range requests in.

Comment on lines +94 to +95
let short_timeout = std::time::Duration::from_secs(60);
let long_timeout = std::time::Duration::from_secs(3600);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be in the config file or something? It seems like it could, potentially, be useful to change them in the field if we see the long timeout hit...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to this, but I only wonder: is plumbing them into nexus config enough to make this modifiable? Seems like callers would still need to change the config among all nexuses, which would either require storing this value in the DB or deploying a new configuration via reconfigurator, right?

Comment on lines 28 to 35
pub fn default_reqwest_client() -> reqwest::Client {
let dur = std::time::Duration::from_secs(60);
reqwest::ClientBuilder::new()
.connect_timeout(dur)
.timeout(dur)
.build()
.expect("Failed to build reqwest Client")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a thought: what if we had a function that returned the client builder with the default values, which this (and other functions in this module) would call to get a builder? That way, code that wants to set additional builder options could use that , and call additional methods on the "base" builder, but getting the "default" settings for anything it doesn't care to override.

Currently, we have the "short" timeout for connections/requests defined as 60s both here and in src/app/support_bundle.rs. If we wanted to add other "default client settings" here, we'd have to remember to update support_bundle.rs as well. On the other hand, if we had a "default builder" function that support_bundle.rs called and then added its own additional configurations to, we wouldn't have to remember to add new default settings in two or more places.

Just a thought.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated to using the default builder

@smklein smklein merged commit 0313f70 into main May 16, 2025
16 checks passed
@smklein smklein deleted the longer-sb-timeout branch May 16, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support bundle downloads timeout after 60 seconds
3 participants