-
Notifications
You must be signed in to change notification settings - Fork 841
Open
Description
Describe the bug
We periodically see errors claiming consistency checks failed when making queries with tenant federation enabled. The blocks that it reports issues with are not a part of the tenant that has the data.
When this happens, we get messages like this in the query frontend logs:
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=debug ts=2023-05-26T17:46:54.654382213Z caller=results_cache.go:374 traceID=5a06f1b765aa8748 msg="handle miss" start=1683676800000 spanID=692181cda7283b7f
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=error ts=2023-05-26T17:46:54.759380243Z caller=retry.go:79 traceID=5a06f1b765aa8748 msg="error processing request" try=0 err="rpc error: code = Code(500) desc = {\"status\":\"error\",\"errorType\":\"internal\",\"error\":\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\"}"
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=error ts=2023-05-26T17:46:54.901137753Z caller=retry.go:79 traceID=5a06f1b765aa8748 msg="error processing request" try=1 err="rpc error: code = Code(500) desc = {\"status\":\"error\",\"errorType\":\"internal\",\"error\":\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\"}"
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=error ts=2023-05-26T17:46:54.954194856Z caller=retry.go:79 traceID=5a06f1b765aa8748 msg="error processing request" try=2 err="rpc error: code = Code(500) desc = {\"status\":\"error\",\"errorType\":\"internal\",\"error\":\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\"}"
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=error ts=2023-05-26T17:46:55.018502766Z caller=retry.go:79 traceID=5a06f1b765aa8748 msg="error processing request" try=3 err="rpc error: code = Code(500) desc = {\"status\":\"error\",\"errorType\":\"internal\",\"error\":\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\"}"
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=error ts=2023-05-26T17:46:55.401956233Z caller=retry.go:79 traceID=5a06f1b765aa8748 msg="error processing request" try=4 err="rpc error: code = Code(500) desc = {\"status\":\"error\",\"errorType\":\"internal\",\"error\":\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\"}"
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=warn ts=2023-05-26T17:46:55.402152993Z caller=logging.go:86 traceID=5a06f1b765aa8748 msg="GET /prometheus/api/v1/query_range?query=customer:rts_BWbits:sum&start=1683676800&end=1683763200&step=300 (500) 747.968778ms Response: \"{\\\"status\\\":\\\"error\\\",\\\"errorType\\\":\\\"internal\\\",\\\"error\\\":\\\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\\\"}\" ws: false; Accept: */*; Connection: close; User-Agent: curl/7.88.1; X-Scope-Orgid: rts|fake; "
Whereas the successful query shows up like:
[pod/v1-cortex-query-frontend-cfc6c9d69-wh2ph/query-frontend] level=debug ts=2023-05-26T17:48:52.834813942Z caller=results_cache.go:374 org_id=rts traceID=20a4ac98a88c3d61 msg="handle miss" start=1683676800000 spanID=53ca2a15ef6f8322
[pod/v1-cortex-query-frontend-cfc6c9d69-wh2ph/query-frontend] level=debug ts=2023-05-26T17:48:52.939481535Z caller=logging.go:76 traceID=20a4ac98a88c3d61 msg="GET /prometheus/api/v1/query_range?query=customer:rts_BWbits:sum&start=1683676800&end=1683763200&step=300 (200) 105.020042ms"
To Reproduce
Steps to reproduce the behavior:
- Start Cortex 1.14.1
- Perform federated query
Expected behavior
I'd expect it to not error like this. I'm not sure what else to say.
Environment:
- Infrastructure: Kubernetes
- Deployment tool: Helm
Additional Context
Storage gateway logs: https://gist.github.com/blovett/84b08f2608f3cccf2cf4865c485720db
I also included logs above. But, if there are more that I can provide that could help troubleshoot this, please let me know.