You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've observed extremely long gaps in executions of task batches. Based on the logs, these gaps appear to be the result of ReplicationThrottleHelper submitting requests to add or remove the replication throttle rate configurations and then waiting for the change to be reflected on the broker. Specifically, there is a 10 second gap between each broker's configuration change. This can lead to minutes of time doing nothing, even on small 6 or 12 node clusters.
The specific mechanism that ReplicationThrottleHelper is using to wait for changes is to call CruiseControlMetricsUtils.retry, specifically, the overload that uses the default backoff configurations of scale=5 seconds and base=2.
I assume that the first describe request doesn't return the expected configurations for some reason, resulting in the retry loop triggering the first 10 second backoff.
10 seconds seems like an excessive amount of time to wait for the first retry, so it would be nice if the retry loop started with a much smaller scale, somewhere on the order of a few milliseconds. Because the exponential backoff has no cap, starting with a small scale is somewhat necessary to prevent the backoff from becoming unreasonably large after only one or two retries.
The text was updated successfully, but these errors were encountered:
I've observed extremely long gaps in executions of task batches. Based on the logs, these gaps appear to be the result of ReplicationThrottleHelper submitting requests to add or remove the replication throttle rate configurations and then waiting for the change to be reflected on the broker. Specifically, there is a 10 second gap between each broker's configuration change. This can lead to minutes of time doing nothing, even on small 6 or 12 node clusters.
The specific mechanism that ReplicationThrottleHelper is using to wait for changes is to call CruiseControlMetricsUtils.retry, specifically, the overload that uses the default backoff configurations of
scale=5 seconds
andbase=2
.I assume that the first describe request doesn't return the expected configurations for some reason, resulting in the retry loop triggering the first 10 second backoff.
10 seconds seems like an excessive amount of time to wait for the first retry, so it would be nice if the retry loop started with a much smaller scale, somewhere on the order of a few milliseconds. Because the exponential backoff has no cap, starting with a small scale is somewhat necessary to prevent the backoff from becoming unreasonably large after only one or two retries.
The text was updated successfully, but these errors were encountered: