Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promdump OOMs when using Large date range #14

Open
ckamaleo opened this issue Sep 28, 2022 · 7 comments
Open

Promdump OOMs when using Large date range #14

ckamaleo opened this issue Sep 28, 2022 · 7 comments

Comments

@ckamaleo
Copy link

ckamaleo commented Sep 28, 2022

Is there a limit on the number of hours/days we should use? It OOMs when we use 2 days or more, for example:

~$ ./promdump -min-time `date +%s%N --date "2022-09-24 00:00:00"` -max-time `date +%s%N --date "2022-09-27 00:00:00"` -data-dir /opt/yugabyte/prometheusv2 > test_prom_dump2.tgz
Killed

We See its stuck at this stage and then it gets killed:

<Skipping>
time=2022-09-28T17:04:48Z caller=level.go:63 level=debug message="checking block" path=01GE2GN42ZPGBTVXGX4BPPFVGK minTime(utc)=2022-09-28T14:00:01.387Z maxTime(utc)=2022-09-28T16:00:00Z
time=2022-09-28T17:04:48Z caller=level.go:63 level=debug message="skipping block" path=01GE2GN42ZPGBTVXGX4BPPFVGK
time=2022-09-28T17:04:48Z caller=level.go:63 level=debug message="finish parsing persistent blocks" numBlocksFound=6
Killed

/var/log/messages shows promdump is getting killed due to OOM:

Sep 28 17:06:20 kachand-ybany kernel: [4826585.690969] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice,task=promdump,pid=16712,uid=1016
Sep 28 17:06:20 kachand-ybany kernel: [4826585.691001] Out of memory: Killed process 16712 (promdump) total-vm:34456068kB, anon-rss:12744864kB, file-rss:0kB, shmem-rss:0kB, UID:1016 pgtables:25368kB oom_score_adj:0
Sep 28 17:06:20 kachand-ybany kernel: [4826585.889349] oom_reaper: reaped process 16712 (promdump), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Sep 28 17:08:19 kachand-ybany systemd[1]: Started Session 1560 of user centos.
Sep 28 17:09:14 kachand-ybany kernel: [4826760.104395] containerd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-999
Sep 28 17:09:14 kachand-ybany kernel: [4826760.104400] CPU: 0 PID: 6143 Comm: containerd Not tainted 5.4.0-1083-gcp #91~18.04.1-Ubuntu
Sep 28 17:09:14 kachand-ybany kernel: [4826760.104401] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022

Any workarounds we can use? other than increasing the memory?

@ihcsim
Copy link
Owner

ihcsim commented Sep 29, 2022

Can you use a smaller date/time range? promdump doesn't impose any limits on time range - it boils down to your node resource. The log shows that promdump was able to finish parsing the blocks, so it's probably happening while promdump is trying to compress and write the results to stdout.

It's hard to tell whether tweaking the compression code will help with your case, without knowing how much data you are dealing with.

@ckamaleo
Copy link
Author

Yes, It works with a smaller time range. Is there any option to use promdump on larger time range/dataset on systems with limited resources. (For example, by splitting the output into small chunks based on a runtime option and then concatenate them)?

@ihcsim
Copy link
Owner

ihcsim commented Sep 29, 2022

Not at the moment. It's no different from using a smaller time range though. The data blocks are partitioned by time range and there isn't a way to slice up a data block into smaller units, and then merge them back together without corrupting it.

It's also not possible to run promdump on a remote node because the Prometheus TSDB is only accessible locally.

@ihcsim
Copy link
Owner

ihcsim commented Sep 29, 2022

We can try to paginate the data blocks while writing to stdout, but if each data block is big in relative to your node's resource, you can still run into OOM issue.

@ckamaleo
Copy link
Author

Can we get pagination based on a specific time window? like --window=1 minute ?

@ihcsim
Copy link
Owner

ihcsim commented Sep 29, 2022

You can't. If you ask the TSDB for time series data between 2022-09-24 00:00:00 and 2022-09-24 00:01:00, it will still return the entire data block that contains the data. If that single block is huge, you will still have OOM issue.

@ajaydevtron
Copy link

@ihcsim
Could you please share any workaround for same even i pass the short time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants