-
Notifications
You must be signed in to change notification settings - Fork 841
Open
Description
Cortex only retry fetching a block from a store gateway upon error, see:
cortex/pkg/querier/blocks_store_queryable.go
Lines 604 to 609 in dd4240d
| if err != nil { | |
| if isRetryableError(err) { | |
| level.Warn(spanLog).Log("err", errors.Wrapf(err, "failed to fetch series from %s due to retryable error", c.RemoteAddress())) | |
| return nil | |
| } | |
| return errors.Wrapf(err, "failed to fetch series from %s", c.RemoteAddress()) |
cortex/pkg/querier/blocks_store_queryable.go
Line 503 in dd4240d
| for attempt := 1; attempt <= maxFetchSeriesAttempts; attempt++ { |
This means that is a single store gateway is just slow and not return an error, the query will eventually timeout.
This scenario can happens for multiple reasons like network partition between store gateway and the storage or a slow disk.
On those cases we could:
- Try to fetch at least 2 store-gateways in parallel, or
- Have some mechanism to make store-gateway advertise that he cannot handle requests (set itself to unhealthy?)