Skip to content

[BUG] Cluster Agent External Metrics Provider returns HTTP 500 for "no serie found" — should return 422 to enable downstream FillValue/fallback handling #46843

@yhrunyk

Description

@yhrunyk

Agent version

7.75.0

Bug Report

When a DatadogMetric query returns no data (e.g., no matching timeseries exist for the queried time window), the Cluster Agent's External Metrics Provider wraps the error as HTTP 500 Internal Server Error via apierr.NewInternalError(). This makes it impossible for downstream consumers (KEDA, custom HPA controllers) to distinguish between "no data available for this query" and "the server is broken."

KEDA (since v2.19.0) added support for metricUnavailableValue (FillValue) when the external metrics API returns HTTP 422 Unprocessable Entity. However, because the Cluster Agent always returns HTTP 500, this feature is completely non-functional when using the Cluster Agent scaler.

Possible root cause in code:

  1. When a query returns no series, queryDatadogExternal() in pkg/util/kubernetes/autoscalers/datadogexternal.go marks the point as invalid with error:
    no serie was found for this query in API Response, check Cluster Agent logs for QueryIndex errors
  2. This sets DatadogMetricInternal.Valid = false and DatadogMetricInternal.Error =
  3. When KEDA (or any HPA controller) calls GetExternalMetric(), ToExternalMetricFormat returns this error because Valid == false
  4. GetExternalMetric then unconditionally wraps all errors with apierr.NewInternalError(err), which always produces HTTP 500:
    convErr := apierr.NewInternalError(err) if convErr != nil { err = convErr }

Expected behavior:

"No data available" errors should return HTTP 422 (Unprocessable Entity) — consistent with how the Datadog API itself responds to queries with no matching data. Server errors (auth failures, connectivity issues, etc.) should remain HTTP 500.

This would allow KEDA's metricUnavailableValue / FillValue to work correctly, and would be consistent with the Datadog API's own behavior.

Related upstream issues:

  • KEDA #7246 — discusses this exact Cluster Agent → 500 wrapping problem
  • KEDA #7384 — added 422 FillValue handling (works for direct API, but not Cluster Agent)

Reproduction Steps

  1. Deploy Datadog Cluster Agent with external metrics provider enabled
  2. Create a DatadogMetric with a query that returns no data:
apiVersion: datadoghq.com/v1alpha1
kind: DatadogMetric
metadata:
  name: test-metric
  namespace: default
spec:
  query: "sum:trace.some.metric{env:test,resource_name:nonexistent*}.as_rate()"
  1. Create a KEDA ScaledObject referencing this metric with metricUnavailableValue:
triggers:
  - type: datadog
    metadata:
      useClusterAgent: "true"
      datadogMetric: "default-test-metric"
      targetValue: "5"
      metricUnavailableValue: "2"
  1. Observe that the DatadogMetric enters error state:
$ kubectl get datadogmetric test-metric -o jsonpath='{.status}'
# Shows: Error condition "Unable to fetch data from Datadog"
# Message: "no serie was found for this query..."
  1. Check KEDA operator logs:
$ kubectl logs -n keda deploy/keda-operator | grep -i "error\|500\|fillvalue"
# Shows HTTP 500 errors, no FillValue usage
  1. Expected: KEDA should receive HTTP 422, recognize it as "no data," and use the configured metricUnavailableValue (2) as the metric value
  2. Actual: KEDA receives HTTP 500, treats it as a server error, activates fallback mechanism, and metricUnavailableValue is never used

Agent configuration

KEDA operator logs:
error getting metric value: error getting metric value (status 500): Internal error occurred: Processing data from API failed, reason: no serie was found for this query in API Response, check Cluster Agent logs for QueryIndex errors, query was: sum:trace.envoy.proxy.hits{***_env:***-sbx-2,resource_name:test*}.as_rate()

DatadogMetric CRD status (from kubectl get datadogmetric):

Error condition: "Unable to fetch data from Datadog"
Message: "no serie was found for this query"
Last valid value: 13.2 (cached from 2026-02-16)
Conditions: Valid=False, Active=True, Error=True

Operating System

No response

Other environment details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    oss/0External contributions priority 0pendingLabel for issues waiting a Datadog member's response.team/container-autoscaling

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions