Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataplane certs not renewed after expiry #12409

Closed
sainooli opened this issue Dec 27, 2024 · 3 comments
Closed

dataplane certs not renewed after expiry #12409

sainooli opened this issue Dec 27, 2024 · 3 comments
Labels
kind/bug A bug triage/pending This issue will be looked at on the next triage meeting

Comments

@sainooli
Copy link

What happened?

We have been seeing intermittent 502s with applications hosted on AKS. Upon troubleshooting, it was identified that requests going to particular ingress controller pod is consistently throwing a 502 while others are not. It starts working if that particular pod is restarted. Within the next day or two, either same ingress pod or a different pod behaves the same way. We have enabled debug on the failing pod's kuma-sidecar and noticed it is unable to verify cert with below errors

[2024-12-26 15:37:17.970][50][debug][connection] [source/extensions/transport_sockets/tls/cert_validator/default_validator.cc:323] verify cert failed: X509_verify_cert: certificate verification error at depth 0: certificate has expired
[2024-12-26 15:37:17.970][50][debug][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:280] [Tags: "ConnectionId":"130329"] remote address:192.168.4.164:9898,TLS_error:|268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED:TLS_error_end
[2024-12-26 15:37:17.970][50][debug][connection] [source/common/network/connection_impl.cc:278] [Tags: "ConnectionId":"130329"] closing socket: 0
[2024-12-26 15:37:17.970][50][debug][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:280] [Tags: "ConnectionId":"130329"] remote address:192.168.4.164:9898,TLS_error:|268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED:TLS_error_end:TLS_error_end
[2024-12-26 15:37:17.970][50][debug][pool] [source/common/conn_pool/conn_pool_base.cc:484] [Tags: "ConnectionId":"130329"] client disconnected, failure reason: TLS_error:|268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED:TLS_error_end:TLS_error_end

So, when I looked the dataplaneinsights of this particular dataplane, I see the cert is expired while the DP is still active

k get dataplaneinsights ingress-nginx-controller-58b64bb8dc-cwm8g -n ingress-nginx -oyaml
apiVersion: kuma.io/v1alpha1
kind: DataplaneInsight
mesh: default
metadata:
  creationTimestamp: "2024-12-21T00:00:03Z"
  generation: 1122
  name: ingress-nginx-controller-58b64bb8dc-cwm8g
  namespace: ingress-nginx
  ownerReferences:
  - apiVersion: kuma.io/v1alpha1
    kind: Dataplane
    name: ingress-nginx-controller-58b64bb8dc-cwm8g
    uid: aeabcfbe-d073-459d-9a20-99656c200754
  resourceVersion: "570371706"
  uid: 5a7bd2ed-a6ef-457e-9916-1489d04dc16b
status:
  mTLS:
    certificateExpirationTime: "2024-12-23T20:34:12Z"
    certificateRegenerations: 1
    issuedBackend: ca-1
    lastCertificateRegeneration: "2024-12-22T20:34:12.552309523Z"

In some cases, ingress DP certs are valid whereas the application's DP certs are expired. So, it is pretty inconsistent as to which component is consistently having issues. Overall, it is very random across the cluster. Either ingress controller pods restart fixes it or/and application pod restart fixes it.

Environment

Kuma version 2.5.10
AKS 1.28.15 and 1.30.6
Ingress controller version 1.10.3

How to reproduce

Took few days after several attempt so below steps might not work immediately

  1. Install Kuma 2.5.10 and ingress controller 1.10.3 on AKS 1.28.15 or 1.30.6
  2. Deploy a sample app with sidecar injection label
  3. Wait for atleast a day or two and continuously access the application using ingress url and you should see 502s come up intermittently.
@sainooli sainooli added kind/bug A bug triage/pending This issue will be looked at on the next triage meeting labels Dec 27, 2024
@lobkovilya
Copy link
Contributor

Triage: Kuma 2.5.x is out of support https://github.com/kumahq/kuma/blob/master/versions.yml#L21, please upgrade to a supported version and please reopen if you still see the issue

@lobkovilya lobkovilya closed this as not planned Won't fix, can't repro, duplicate, stale Jan 6, 2025
@sainooli
Copy link
Author

sainooli commented Jan 6, 2025

We have other bugs when we upgrade to a version >2.5.10 which we have already opened an issue about. Can you at least give some pointers or related issues for these kind of issues? I have searched this repo and couldn't find any.

@lobkovilya
Copy link
Contributor

We have other bugs when we upgrade to a version >2.5.10 which we have already opened an issue about

@sainooli could you please refer to other issues? I tried filter issues authored by you and I see only this one, so maybe your colleagues opened them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug triage/pending This issue will be looked at on the next triage meeting
Projects
None yet
Development

No branches or pull requests

2 participants