Skip to content

Conversation

9999years
Copy link
Contributor

We've seen a bunch of (transient) failures with our build cache when downloading artifacts. This adds retry logic for BatchReadBlobs requests similar to the RemoteExecutionClient::new_retry logic:

pub async fn new_retry(re_config: &RemoteExecutionConfig) -> buck2_error::Result<Self> {
// Loop happens times-1 times at most
for i in 1..re_config.connection_retries {
match Self::new(re_config).await {
Ok(v) => return Ok(v),
Err(e) => {
let e: buck2_error::Error = e.into();
if e.find_typed_context::<RemoteExecutionError>().is_none() {
// If we cannot connect to RE due to some non-RE error, we should not retry
// And should just return the error immediately as it's unlikely to be flakey
return Err(e.into());
}
tracing::warn!(
"Failed to connect to RE, retrying after sleeping {} seconds: {:#?}",
i,
e
);
tokio::time::sleep(Duration::from_secs(i as u64)).await;
}
}
}
Self::new(re_config).await
}

In the future, we'll likely want more sophisticated tuning of backoff parameters (minimum & maximum delays, jitter, exponential backoff, etc.). I'm not able to test this at the moment as the failures are flaky.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 13, 2025
@facebook-github-bot
Copy link
Contributor

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this in D80221251. (Because this pull request was imported automatically, there will not be any future comments.)

@9999years 9999years force-pushed the wiggles/dux-3130-buck2-buildbuddy-remote-cache-transport-errors branch 3 times, most recently from 51d7c36 to a17c26c Compare August 18, 2025 19:51
@9999years 9999years marked this pull request as ready for review August 18, 2025 23:08
@9999years 9999years force-pushed the wiggles/dux-3130-buck2-buildbuddy-remote-cache-transport-errors branch from a17c26c to 10629ed Compare September 9, 2025 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants