Increased memory useage with latest version #321

s4mur4i · 2024-09-04T08:38:04Z

Describe the bug
We upgrade our popeye from version 0.11.1 to 0.21.3
Previously popeye was using limits of 100 CPU and 100-200 MB Memory. with latest version it needs 500 CPU and under 1 GB of memory it gets oom killed. on some clusters it even requires 4.5GB of memory. is this something expected, or why did popeye suddenly start using so much memory.

To Reproduce
Steps to reproduce the behavior:

Upgraded popeye to latest version

Expected behavior
I would be okay with some increase, but 3-4 GB of memory requirement seems to much, when previous one used around 100-200 MB

Screenshots
Grafana output of one run

Versions (please complete the following information):

Popeye v0.21.3
K8s 1.29.7-eks

derailed · 2024-09-14T17:52:16Z

@s4mur4i Thanks for reporting this!
How big is your cluster? nodes, pods, etc...
Also how are you running popeye ie wide open or using filters?

s4mur4i · 2024-09-30T06:51:22Z

Hello,
Sorry, I was on holiday and could not respond for some time.
Our clusters usually have around 10-20 nodes, and some might go up to 30 nodes.
With pods, I would say 300-600 pods.
We have different products.. each product has its cluster for the dev/prod environment.
We use following arguments:

-A -f  /spinach/spinach.yaml --out  html -l  info --s3-bucket  xyz --push-gtwy-url http://pushgateway-service:9091 --cluster xyz --kubeconfig /etc/kubeconfig/config.yaml --force-exit-zero=true
                ```
We tested with separating pushgateway and bucket upload into 2 separate pods as it was done previously, but it didn't lower memory useage.

derailed · 2024-12-30T19:19:56Z

@s4mur4i Thank you for the details! Popeye now uses an in memory database that is loaded and kept around until the process finishes. For larger clusters this could cause the mem footprint to be larger now vs prior releases.
I'll take a peek to see if we can trim things a bit more. In the meantime could you share your spinach file?
Also it will help if you target specific namespaces vs all namespaces since the in mem corpus will be much smaller.

s4mur4i · 2024-12-31T11:47:00Z

Hello @derailed
Thanks for the information, can we get some metrics or details what is loaded into the memory database?
I can try running some special builds to understand more deeper who it performs, or which part is growing to big.
generally when we deploy even 1-2 service further to clsuter memory footprint can increase with 2-300 MB.
Our spinach config is quite simple:

    popeye:
      excludes:
        linters:
          clusterroles:
            codes:
              - '400'
          configmaps:
            codes:
              - '400'
              - '401'
          daemonsets:
            instances:
              - fqns: ['rx:kube-system/kube-proxy']
                codes:
                  - 107
            codes:
              - '108'
              - '505'
          deployments:
            codes:
              - '108'
              - '505'
          horizontalpodautoscalers:
            codes:
              - '602'
              - '603'
              - '604'
              - '605'
          namespaces:
            instances:
              - fqns: ['default', 'kube-node-lease', 'kube-public']
                codes:
                  - 400
          nodes:
            codes:
              - '700'
              - '709'
              - '710'
          pods:
            instances:
              - fqns:
                  [
                    'rx:kube-system/node-tagger*',
                    'rx:kube-system/kube-proxy*',
                    'rx:kube-system/aws-node*',
                    'rx:kube-system/ebs-csi*',
                    'rx:cloud/github-actions*',
                  ]
                codes:
                  - 102
              - fqns: ['rx:kube-system/ebs-csi*']
                codes:
                  - 104
              - fqns: ['rx:cronjob']
                codes:
                  - 206
              - fqns:
                  [
                    'rx:xyz',
                    'rx:xyz',
                    'rx:xyz',
                    'cloud/xyz',
                  ]
                codes:
                  - 206
            codes:
              - '105'
              - '108'
              - '109'
              - '110'
              - '111'
              - '112'
              - '203'
              - '204'
              - '205'
              - '207'
              - '300'
              - '301'
              - '302'
              - '306'
              - '1204'
          secrets:
            codes:
              - '400'
              - '401'
          services:
            codes:
              - '1101'
              - '1102'
              - '1103'
              - '1104'
              - '1109'
          persistentvolumeclaims:
            instances:
              - fqns: ['cloud/xyz']
                codes:
                  - 400
          serviceaccounts:
            instances:
              - fqns: ['default/default', 'kube-node-lease/default', 'kube-public/default']
                codes:
                  - 400
              - fqns:
                  ['kube-system/ebs-csi-controller-sa', 'kube-system/ebs-csi-controller-sa', 'kube-system/ebs-csi-node-sa']
                codes:
                  - 303
          statefulsets:
            codes:
              - '108'
              - '503'
          ingresses:
            codes:
              - '1403'
          cronjobs:
            codes:
              - '1501'
              - '1502'
              - '1503'
          jobs:
            codes:
              - '1503'
          clusterrolebindings:
            instances:
              - fqns: ['system:controller:route-controller', 'system:kube-dns']
                codes:
                  - 1300
                  ```
I removed some of our internal application names and replazed with xyz, i dont believe they should be relevant to case.
Our namespaces are quite simple:

cloud
default
kube-node-lease
kube-public
kube-system

other clusters have 1 extra namespace, and one of them has 2 extra ones.. and kube-public, default, kube-node-lease is not utilized by us, and kube-system is only used with default services.. so all our services live in the cloud namespace, or another organization specific one.

derailed added question Further information is requested performance labels Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increased memory useage with latest version #321

Increased memory useage with latest version #321

s4mur4i commented Sep 4, 2024

derailed commented Sep 14, 2024

s4mur4i commented Sep 30, 2024

derailed commented Dec 30, 2024

s4mur4i commented Dec 31, 2024

Increased memory useage with latest version #321

Increased memory useage with latest version #321

Comments

s4mur4i commented Sep 4, 2024

derailed commented Sep 14, 2024

s4mur4i commented Sep 30, 2024

derailed commented Dec 30, 2024

s4mur4i commented Dec 31, 2024