-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute WAL size and use it during retention size checks #5886
Conversation
unless someone has a better idea, this LGTM after the conflict resolut. |
@krasi-georgiev I've corrected the merge conflicts! |
Sorry I have not been following PRs lately. I had only 1 comment in the old PR
|
@codesome In your scenario, are you referring to when WAL chunks are added/removed? If so, I tested for that particular scenario, and it does not crash. |
Not chunks, but when new files are being added/removed (mostly the case of removing files while we are already iterating the items in the directory). |
From what I can tell, the size computation logic is a blocking call, so no files are added/removed while it iterates over the contents of the |
@codesome did you double check about this race? |
I don't think I will be able to investigate that race right now. |
Had a look now, the possibility of a race is very slim here and not critical. There wont be any issue with deletion of segments as it doesn't happen while the size is being computed (as size of WAL is checked inside
Having gone through the souce of |
Signed-off-by: Dipack P Panjabi <[email protected]>
Sorry, I haven't been following the conversation on this PR for a while now; is there a change required in the WAL truncation logic? I can see that @codesome has tested for the race condition during size computation, and found that that shouldn't be an issue. |
@dipack95 just need to add a test that involves checkpointing and it is ready for merging. https://github.com/prometheus/prometheus/pull/5886/files#r327057583 |
Signed-off-by: Dipack P Panjabi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that I see it can't we remove a lot of the code by just using (h *Head) Truncate
?
I suppose so, considering this was in the WAL test file, I decided to just use the WAL functions directly instead of using |
can you check if my suggestion would remove all that boilerplate than? |
ping @dipack95 |
Hey sorry, I haven't had the time for the last week. I should have the time to update this, and #5887 in a few days. |
Thanks. Appreciated |
ping @dipack95 |
@krasi-georgiev When I create a |
good point, thanks for trying. Had another look and I would say that we want to add all this to the |
Do you mean duplicate the entire test to |
don't think it needs to be duplicated just add some more logic to the existing |
Signed-off-by: Dipack P Panjabi <[email protected]>
@krasi-georgiev That makes sense, I've added the WAL size to the expected size as well. However, I've noticed that over the course of that test, it directly writes blocks to the disk instead of using the Head (and consequently the WAL), and thus the WAL size is always zero. Thus the test never failed with the introduction of this PR. |
yep you will need to add some records to the wal as well. Look at some other tests for ideas. |
Signed-off-by: Dipack P Panjabi <[email protected]>
@krasi-georgiev Please take a look, I've updated the test. |
Signed-off-by: Dipack P Panjabi <[email protected]>
Signed-off-by: Dipack P Panjabi <[email protected]>
@krasi-georgiev Please take a look now. |
LGTM, Thanks unless @codesome , @brian-brazil or anyone else has any other comments will merge this in few days. |
It's good to keep things up to date generally, but there's a specific thing I want here: this chart upgrade also upgrades prometheus from 2.13.2 to 2.15.2. Prometheus 2.15 has a feature which considers the WAL size when calculating how much disk it's using (see prometheus/prometheus#5886 ).
Not sure if it's only me finding that the WAL size is still not used during retention size checks: /prometheus $ prometheus --version
prometheus, version 2.17.1 (branch: HEAD, revision: ae041f97cfc6f43494bed65ec4ea4e3a0cf2ac69)
build user: root@806b02dfe114
build date: 20200326-16:18:19
go version: go1.13.9
/prometheus $
/prometheus $ ps
PID USER TIME COMMAND
1 1000 8d02 /bin/prometheus --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --storage.tsdb.retention.size=15GB --config.file=/etc/prometheus/config_out/prometheus.env.yaml --
79 1000 0:00 sh
87 1000 0:00 sh
99 1000 0:00 ps
/prometheus $ du -h -d 1
79.5M ./01EB7D47PPV49QCNY2MZVJCYTE.tmp
524.5M ./01EBBR0J3VHNPNBYM6CZXM56T9.tmp
62.1M ./01EJFJF03NFF5J29VE9JKEE2MS
512.5M ./01EBBPV9FS1Z0QY4EK9QC7RMQA.tmp
1.2G ./01EJ1DCVBP2C3XS5WVJE6H85AE
1.0G ./01EHG166DB6JYRB79VEMWME7Z6
158.7M ./01EJFJG3Y1CQ3PRRPDQH3NACNQ
512.5M ./01EBAM8DHP0GAKCWWS6P3F7D0E.tmp
1.1G ./01EHVM027KCG4CAT3VGZW6VMJ6
516.5M ./01EBBQ7Q7CG0J99RHM57FE0TF0.tmp
57.5M ./01EJFSAQGFA3PT9BW8JMDKN824
1.0G ./01EGYN0SAP6C43V8RF335SNFFW
1.1G ./01EHA7SGSXJ4CR88GQ4B936SNZ
79.5M ./01EB7CYVEXJENABPGR037E14P8.tmp
453.4M ./01EJEXYPAFCM86Y7T5REJXNJHA
8.0K ./01EBCB2SVA5XVV84EC54GR2S0T.tmp
8.0K ./01EBBS9M0417RNJVVTCYEK33RH.tmp
1.3G ./01EJD06W4FTYPFAKYVM7FZNE01
1.1G ./01EH4EE1ZHCXEWP80NJJJJFSW5
516.5M ./01EBAM3540NXPQM49TSAPHSDBH.tmp
1.1G ./01EHNTJTJ7CGDRPSB4H6TA7YE0
516.5M ./01EBATZ99KJ4PX2FG44TW0SZAE.tmp
1.0G ./01EGRVM1JBVNQWD1Y7N5FYN4GN
1.9G ./wal
1.2G ./01EJ76T4CNYNDX6EJ5KMWXPKMZ
16.8G . Please note that we've set --storage.tsdb.retention.size=15GB, but the actual storage usage is 16.8G |
I dont think the |
Signed-off-by: Dipack P Panjabi [email protected]
Closes: #5771
Compute the size of the WAL and use it while calculating if exceeding the max retention size limit.
(Port of prometheus-junkyard/tsdb#651)