Skip to content

Readiness probe missing SSL flags when TLS is required (v1.22.0) #2294

@laurentpellegrino

Description

@laurentpellegrino

Bug Report

Description

In operator v1.22.0, the MongodReadinessCheck was changed from a simple TCP dial to a full MongoDB client connection with RSStatus check (readiness.go diff). However, the default readiness probe command in psmdb_defaults.go was not updated to include SSL flags, unlike the liveness probe which already has them.

This causes the readiness probe to fail when tls.mode: requireTLS is set, because the healthcheck binary tries to connect without TLS to a server that only accepts TLS connections.

Steps to Reproduce

  1. Deploy a PSMDB cluster with tls.mode: requireTLS and crVersion: 1.22.0
  2. Wait for or trigger a pod restart / rolling update
  3. Observe the new pod stays in 1/2 Ready state indefinitely

Expected Behavior

The readiness probe should include --ssl --sslInsecure --sslCAFile --sslPEMKeyFile flags when TLS is enabled, similar to how the liveness probe is configured.

Actual Behavior

The readiness probe command is:

/opt/percona/mongodb-healthcheck k8s readiness --component mongod

Missing the SSL flags. The liveness probe correctly has them:

/opt/percona/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem --startupDelaySeconds 7200

MongoDB logs show a flood of SSLHandshakeFailed errors from 127.0.0.1 (the readiness probe):

"msg":"Error receiving request from client. Ending connection from remote",
"attr":{"error":{"code":141,"codeName":"SSLHandshakeFailed","errmsg":"The server is configured to only allow SSL connections"}}

The readiness probe times out after 2s and the pod never becomes ready.

Impact

This creates a deadlock during rolling updates:

  • Pod N is updated with the new StatefulSet template (missing SSL flags on readiness probe)
  • Pod N never becomes ready
  • The operator's SmartUpdate waits for all pods to be ready before continuing
  • The cluster is stuck in initializing state

Root Cause

In v1.21.2, MongodReadinessCheck only performed a raw TCP dial, so TLS was irrelevant. In v1.22.0, it was changed to use db.Dial() + RSStatus, which requires TLS when the server mandates it. But the probe command in psmdb_defaults.go was not updated to pass SSL flags for the readiness probe.

Suggested Fix

In pkg/apis/psmdb/v1/psmdb_defaults.go, add SSL flags to the readiness probe when TLS is enabled, similar to how the liveness probe handles it:

if replset.ReadinessProbe.TCPSocket == nil && replset.ReadinessProbe.Exec == nil {
    replset.ReadinessProbe.Exec = &corev1.ExecAction{
        Command: []string{
            "/opt/percona/mongodb-healthcheck",
            "k8s", "readiness",
            "--component", "mongod",
        },
    }
    // Add SSL flags when TLS is enabled
    if cr.TLSEnabled() {
        replset.ReadinessProbe.Exec.Command = append(replset.ReadinessProbe.Exec.Command,
            "--ssl", "--sslInsecure",
            "--sslCAFile", "/etc/mongodb-ssl/ca.crt",
            "--sslPEMKeyFile", "/tmp/tls.pem")
    }
}

Environment

  • Operator version: 1.22.0
  • MongoDB version: 8.0.19-7
  • Kubernetes: Talos Linux
  • PSMDB CR: tls.mode: requireTLS, crVersion: 1.22.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions