-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Agent version
7.75.0
Bug Report
When a DatadogMetric query returns no data (e.g., no matching timeseries exist for the queried time window), the Cluster Agent's External Metrics Provider wraps the error as HTTP 500 Internal Server Error via apierr.NewInternalError(). This makes it impossible for downstream consumers (KEDA, custom HPA controllers) to distinguish between "no data available for this query" and "the server is broken."
KEDA (since v2.19.0) added support for metricUnavailableValue (FillValue) when the external metrics API returns HTTP 422 Unprocessable Entity. However, because the Cluster Agent always returns HTTP 500, this feature is completely non-functional when using the Cluster Agent scaler.
Possible root cause in code:
- When a query returns no series, queryDatadogExternal() in pkg/util/kubernetes/autoscalers/datadogexternal.go marks the point as invalid with error:
no serie was found for this query in API Response, check Cluster Agent logs for QueryIndex errors - This sets DatadogMetricInternal.Valid = false and DatadogMetricInternal.Error =
- When KEDA (or any HPA controller) calls GetExternalMetric(), ToExternalMetricFormat returns this error because Valid == false
- GetExternalMetric then unconditionally wraps all errors with apierr.NewInternalError(err), which always produces HTTP 500:
convErr := apierr.NewInternalError(err) if convErr != nil { err = convErr }
Expected behavior:
"No data available" errors should return HTTP 422 (Unprocessable Entity) — consistent with how the Datadog API itself responds to queries with no matching data. Server errors (auth failures, connectivity issues, etc.) should remain HTTP 500.
This would allow KEDA's metricUnavailableValue / FillValue to work correctly, and would be consistent with the Datadog API's own behavior.
Related upstream issues:
- KEDA #7246 — discusses this exact Cluster Agent → 500 wrapping problem
- KEDA #7384 — added 422 FillValue handling (works for direct API, but not Cluster Agent)
Reproduction Steps
- Deploy Datadog Cluster Agent with external metrics provider enabled
- Create a DatadogMetric with a query that returns no data:
apiVersion: datadoghq.com/v1alpha1
kind: DatadogMetric
metadata:
name: test-metric
namespace: default
spec:
query: "sum:trace.some.metric{env:test,resource_name:nonexistent*}.as_rate()"
- Create a KEDA ScaledObject referencing this metric with metricUnavailableValue:
triggers:
- type: datadog
metadata:
useClusterAgent: "true"
datadogMetric: "default-test-metric"
targetValue: "5"
metricUnavailableValue: "2"
- Observe that the DatadogMetric enters error state:
$ kubectl get datadogmetric test-metric -o jsonpath='{.status}'
# Shows: Error condition "Unable to fetch data from Datadog"
# Message: "no serie was found for this query..."
- Check KEDA operator logs:
$ kubectl logs -n keda deploy/keda-operator | grep -i "error\|500\|fillvalue"
# Shows HTTP 500 errors, no FillValue usage
- Expected: KEDA should receive HTTP 422, recognize it as "no data," and use the configured metricUnavailableValue (2) as the metric value
- Actual: KEDA receives HTTP 500, treats it as a server error, activates fallback mechanism, and metricUnavailableValue is never used
Agent configuration
KEDA operator logs:
error getting metric value: error getting metric value (status 500): Internal error occurred: Processing data from API failed, reason: no serie was found for this query in API Response, check Cluster Agent logs for QueryIndex errors, query was: sum:trace.envoy.proxy.hits{***_env:***-sbx-2,resource_name:test*}.as_rate()
DatadogMetric CRD status (from kubectl get datadogmetric):
Error condition: "Unable to fetch data from Datadog"
Message: "no serie was found for this query"
Last valid value: 13.2 (cached from 2026-02-16)
Conditions: Valid=False, Active=True, Error=True
Operating System
No response
Other environment details
No response