Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased memory useage with latest version #321

Open
s4mur4i opened this issue Sep 4, 2024 · 4 comments
Open

Increased memory useage with latest version #321

s4mur4i opened this issue Sep 4, 2024 · 4 comments
Labels
performance question Further information is requested

Comments

@s4mur4i
Copy link

s4mur4i commented Sep 4, 2024




Describe the bug
We upgrade our popeye from version 0.11.1 to 0.21.3
Previously popeye was using limits of 100 CPU and 100-200 MB Memory. with latest version it needs 500 CPU and under 1 GB of memory it gets oom killed. on some clusters it even requires 4.5GB of memory. is this something expected, or why did popeye suddenly start using so much memory.

To Reproduce
Steps to reproduce the behavior:

  1. Upgraded popeye to latest version

Expected behavior
I would be okay with some increase, but 3-4 GB of memory requirement seems to much, when previous one used around 100-200 MB

Screenshots
Grafana output of one run
Screenshot 2024-09-04 at 11 37 00

Versions (please complete the following information):

  • Popeye v0.21.3
  • K8s 1.29.7-eks
@derailed
Copy link
Owner

@s4mur4i Thanks for reporting this!
How big is your cluster? nodes, pods, etc...
Also how are you running popeye ie wide open or using filters?

@derailed derailed added question Further information is requested performance labels Sep 14, 2024
@s4mur4i
Copy link
Author

s4mur4i commented Sep 30, 2024

Hello,
Sorry, I was on holiday and could not respond for some time.
Our clusters usually have around 10-20 nodes, and some might go up to 30 nodes.
With pods, I would say 300-600 pods.
We have different products.. each product has its cluster for the dev/prod environment.
We use following arguments:

-A -f  /spinach/spinach.yaml --out  html -l  info --s3-bucket  xyz --push-gtwy-url http://pushgateway-service:9091 --cluster xyz --kubeconfig /etc/kubeconfig/config.yaml --force-exit-zero=true
                ```
We tested with separating pushgateway and bucket upload into 2 separate pods as it was done previously, but it didn't lower memory useage.

@derailed
Copy link
Owner

@s4mur4i Thank you for the details! Popeye now uses an in memory database that is loaded and kept around until the process finishes. For larger clusters this could cause the mem footprint to be larger now vs prior releases.
I'll take a peek to see if we can trim things a bit more. In the meantime could you share your spinach file?
Also it will help if you target specific namespaces vs all namespaces since the in mem corpus will be much smaller.

@s4mur4i
Copy link
Author

s4mur4i commented Dec 31, 2024

Hello @derailed
Thanks for the information, can we get some metrics or details what is loaded into the memory database?
I can try running some special builds to understand more deeper who it performs, or which part is growing to big.
generally when we deploy even 1-2 service further to clsuter memory footprint can increase with 2-300 MB.
Our spinach config is quite simple:

    popeye:
      excludes:
        linters:
          clusterroles:
            codes:
              - '400'
          configmaps:
            codes:
              - '400'
              - '401'
          daemonsets:
            instances:
              - fqns: ['rx:kube-system/kube-proxy']
                codes:
                  - 107
            codes:
              - '108'
              - '505'
          deployments:
            codes:
              - '108'
              - '505'
          horizontalpodautoscalers:
            codes:
              - '602'
              - '603'
              - '604'
              - '605'
          namespaces:
            instances:
              - fqns: ['default', 'kube-node-lease', 'kube-public']
                codes:
                  - 400
          nodes:
            codes:
              - '700'
              - '709'
              - '710'
          pods:
            instances:
              - fqns:
                  [
                    'rx:kube-system/node-tagger*',
                    'rx:kube-system/kube-proxy*',
                    'rx:kube-system/aws-node*',
                    'rx:kube-system/ebs-csi*',
                    'rx:cloud/github-actions*',
                  ]
                codes:
                  - 102
              - fqns: ['rx:kube-system/ebs-csi*']
                codes:
                  - 104
              - fqns: ['rx:cronjob']
                codes:
                  - 206
              - fqns:
                  [
                    'rx:xyz',
                    'rx:xyz',
                    'rx:xyz',
                    'cloud/xyz',
                  ]
                codes:
                  - 206
            codes:
              - '105'
              - '108'
              - '109'
              - '110'
              - '111'
              - '112'
              - '203'
              - '204'
              - '205'
              - '207'
              - '300'
              - '301'
              - '302'
              - '306'
              - '1204'
          secrets:
            codes:
              - '400'
              - '401'
          services:
            codes:
              - '1101'
              - '1102'
              - '1103'
              - '1104'
              - '1109'
          persistentvolumeclaims:
            instances:
              - fqns: ['cloud/xyz']
                codes:
                  - 400
          serviceaccounts:
            instances:
              - fqns: ['default/default', 'kube-node-lease/default', 'kube-public/default']
                codes:
                  - 400
              - fqns:
                  ['kube-system/ebs-csi-controller-sa', 'kube-system/ebs-csi-controller-sa', 'kube-system/ebs-csi-node-sa']
                codes:
                  - 303
          statefulsets:
            codes:
              - '108'
              - '503'
          ingresses:
            codes:
              - '1403'
          cronjobs:
            codes:
              - '1501'
              - '1502'
              - '1503'
          jobs:
            codes:
              - '1503'
          clusterrolebindings:
            instances:
              - fqns: ['system:controller:route-controller', 'system:kube-dns']
                codes:
                  - 1300
                  ```
I removed some of our internal application names and replazed with xyz, i dont believe they should be relevant to case.
Our namespaces are quite simple:

cloud
default
kube-node-lease
kube-public
kube-system

other clusters have 1 extra namespace, and one of them has 2 extra ones.. and kube-public, default, kube-node-lease is not utilized by us, and kube-system is only used with default services.. so all our services live in the cloud namespace, or another organization specific one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants