-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
btrfs-balance freezes desktop for up to 30 minutes #53
Comments
I've encountered this problem right few minutes ago. It took over an hour before I gave up and forcibly turned off the computer. The whole system was basically frozen due to this f*** process, I was unable to do anything! I have SSD too, only ~20 % of used disk space. I think that it was not the first time btrfs-balance run (it was not the first time system frozen due to some btrfs process right in the midnight, but it usually took few minutes at most), but as I see in the Now I've disabled this script. |
Maybe it's feasible to add support for the Background: Yesterday I (manually) started a balance on a bigger file system which had several devices added (and was thus rather "out of balance") and never had any balance run. I think it's ok for a tool like btrfsmaintenance to work incrementally each time it runs. There might not always be a need for it to produce a "perfect" result. Or at least be able to instruct it to behave in that way. |
Same issue here with openSUSE Tumbleweed. I have to uninstall btrfsmaintenance to make my system usable. |
I have added ionice to btrfs-balance in #66 . That helped in my setup a lot to make it run in background without affecting workload. Anybody can try this? |
Indeed, per @sur5r suggestion, I also found solid reasoning at netdata/netdata#3203 (comment) to go at it in smaller pieces, like I understand this project has identified |
I'm not sure about the Do you have test images of the FS where you can time how long a balance takes with different arguments? That said, I feel |
Have you thought about changing the IO scheduler to something like bfq? |
I just ran the command above ( Without the limit parameters and 80 as usage, I quickly got 100% CPU usage again but it didn't freeze my computer like it did before. |
Just ran into this as well. A fix would be great. It's very frustrating if your system freezes randomly. |
@digulla I have experienced the same issues as others, on Opensuse Tumbleweed. Also doing a normal manual balance freezes the system. I've doing some tests and running manual balances with sudo nice -n 19 ionice -c3 btrfs balance start -v /folder/path keeps the system responsive. Adding the ionice -c3 did improve the situation but the system was still freezing for like 10 seconds every minute or so, so it was not a 100% solution. |
My understanding(still new to BTRFS), is that the setting looks for chunks(metadata specifically due to the I'm not sure what you're thinking by top 30%, the usage value is one parameter to filter chunks to balance. The higher the value, the less free space in the chunks to reclaim afaik? https://btrfs.wiki.kernel.org/index.php/Balance_Filters Chunks are generally allocated with 1GiB of storage, according to this. So
Rots in what way? You would increase the
The limit helps by reducing the amount of chunks to balance/process. Just because other blocks are not processed doesn't mean they're rotting in anyway, balance isn't for that purpose afaik. When the balance is run again, it won't target the same chunks as before as the filter is looking for chunks which meet the requirements(at least (100-X)% space free, or rather no more than X% used/allocated). Thus it'll go for the next chunks that will make the most difference. The more you process, the more the returns/benefits diminish, and the more of an impact it will have on you as a user which this issue is raising a concern about. https://superuser.com/questions/1295890/btrfs-huge-metadata-allocation
Due to CoW of BTRFS, I think that still involves writing a whole new chunk to merge chunks, so if you had two chunks at 70% and 20% usage (~900MiB), it could be merged into one chunk, but need to write 900MiB data to a new 1GiB chunk. If you had 4 chunks all about 70% each, and these were balanced, you write 2.4GiB to 3 chunks, saving just 1 chunk. I'm probably wrong, and less data is actually written to shuffle/merge chunks(maybe only writes the chunk to empty to existing ones free space). If I'm not, you can see how higher So using a value of 50% it should work alright? I'm not sure how much 1GiB of metadata covers in data chunks, but I think it's quite a bit(the link shows 50MiB covering about 1.6GiB). |
The initial balance with a usage value of
Noted in the BTRFS changelog as "auto blockgroup reclaim":
The balance command shared earlier in this issue referencing a linked issue that it was sourced from was to be run daily btw, vs the timer this project uses which is monthly longer. No idea what is better, but running daily with a limit of 2 chunks likely wouldn't take long or be too noticeable vs monthly on all chunks that meet the filter? |
I've recently upgraded to Leap 15.0 and after two days of usage, my desktop suddenly froze for about 30 minutes. Login took 2-3 minutes. Starting a root shell took a minute. Starting atop took a minute. btrfs was at 100% CPU.
It turned out that btrfs-balance.sh was the culprit. I figured that this was because the script has never run before, so I rolled my eyes and sat it out.
Yesterday, the script run again, hogging the computer for a couple of minutes.
I have two major issues with this:
My main disk is an SSD. Hogging the computer for several minutes when I want to do work is too much.
For the time being, I've disabled the script with
systemctl disable btrfs-balance.timer
Please either make it run faster (less than 5 seconds) with a clearly visible indication that this job has started (so I know why my computer isn't responding anymore) or turn it into a background job which doesn't freeze the whole computer.
The text was updated successfully, but these errors were encountered: