Skip to content

Long Time_queued_by_rc Duration Even When Resource Group Has Sufficient Resources #1855

@AndreMouche

Description

@AndreMouche

Problem Description

The client doesn't cache any tokens if the resource group token is available, so it must acquire a token every time. This may cause long latency when a retry happens. Each retry requires re-acquiring the token through OnRequestWait, even though the resource group has sufficient quota. The delays from token acquisition (lock contention, PD communication latency, request queuing) accumulate during retries, significantly increasing total latency.

Current Behavior: Token Acquisition on Every Request

In the SendRequest

func (r interceptedClient) SendRequest(ctx context.Context, addr string, req *tikvrpc.Request, timeout time.Duration) (resp *tikvrpc.Response, err error) {
var ruDetails *util.RUDetails
resourceGroupName, resourceControlInterceptor, reqInfo := getResourceControlInfo(ctx, req)
if resourceControlInterceptor != nil {
consumption, penalty, waitDuration, priority, err := resourceControlInterceptor.OnRequestWait(ctx, resourceGroupName, reqInfo)
if err != nil {
return nil, err
}
req.GetResourceControlContext().Penalty = penalty
// override request priority with resource group priority if it's not set.
// Get the priority at tikv side has some performance issue, so we pass it
// at client side. See: https://github.com/tikv/tikv/issues/15994 for more details.
if req.GetResourceControlContext().OverridePriority == 0 {
req.GetResourceControlContext().OverridePriority = uint64(priority)
}
if val := ctx.Value(util.RUDetailsCtxKey); val != nil {
ruDetails = val.(*util.RUDetails)
ruDetails.Update(consumption, waitDuration)
}
}
if ctxInterceptor := interceptor.GetRPCInterceptorFromCtx(ctx); ctxInterceptor == nil {
resp, err = r.Client.SendRequest(ctx, addr, req, timeout)
} else {
resp, err = ctxInterceptor.Wrap(func(target string, req *tikvrpc.Request) (*tikvrpc.Response, error) {
return r.Client.SendRequest(ctx, target, req, timeout)
})(addr, req)
}
if resourceControlInterceptor != nil && resp != nil {
respInfo := resourcecontrol.MakeResponseInfo(resp)
consumption, waitDuration, err := resourceControlInterceptor.OnResponseWait(ctx, resourceGroupName, reqInfo, respInfo)
if err != nil {
return nil, err
}
if ruDetails != nil {
ruDetails.Update(consumption, waitDuration)
}
}
return resp, err
}

Here

func (r interceptedClient) SendRequest(ctx context.Context, addr string, req *tikvrpc.Request, timeout time.Duration) (resp *tikvrpc.Response, err error) {
    // ...
    if resourceControlInterceptor != nil {
        // OnRequestWait is called on every invocation to acquire a token
        consumption, penalty, waitDuration, priority, err := resourceControlInterceptor.OnRequestWait(ctx, resourceGroupName, reqInfo)
        // ...
    }
    // ...
}

these can cause Latency Accumulation in Retry Scenarios

  • Each retry requires waiting for token acquisition (OnRequestWait)
  • Even with sufficient tokens, each acquisition may have delays:
    • Lock contention
    • PD communication latency
    • Concurrent request queuing
  • These delays accumulate during retries, causing significant total latency increase

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions