-
Notifications
You must be signed in to change notification settings - Fork 14
Description
System information
| Type | Version/Name |
|---|---|
| Distribution Name | macOS |
| Distribution Version | 10.15.7 |
| Linux Kernel | Darwin Kernel Version 19.6.0 |
| Architecture | x86_64 |
| ZFS Version | zfs-macOS-2.2.3-rc4 |
Describe the problem you're observing
Updating to any macOS ZFS version above v2.1.0 results in extremely poor system performance while datasets are mounted and in use.
The issue appears to be with ARC, as setting primarycache=none and secondarycache=none significantly improves system performance/responsiveness, minus the cost of loading more data from disk.
Describe how to reproduce the problem
- Install v2.1.0 on a known affected system (2018 Mac Mini, 2008 Mac Pro, 2010 Mac Pro running Catalina are confirmed)
- Create/mount datasets and verify performance is as expected. Encryption, compression etc. should all perform within reasonable margins.
- Upgrade to v2.1.6 or later (including v2.2.3rc4).
- Observe performance is significantly worse.
- Run the commands
zfs set primarycache=none <pool>andzfs set secondarycache=none <zpool>(may need to set for additional datasets if values are not inherited). - After a short delay, performance should significantly improve (a restart may be required).
- Revert to v2.1.0 and set
primarycacheandsecondarycacheback to previous values (default is=all). - Performance should be much better with ARC functioning as normal.
Attachments
The following spindumps were all generated under v2.2.3rc4 with ARC configured as normal (in use), causing many programs to run extremely slowly as they spend large amounts of time waiting for data.
Unfortunately due to the slow system responsiveness it was difficult to generate spindumps at the moments of worst performance, though I tried. Of note, spindump.6.txt was taken while attempting to decrypt a dataset into a new (unencrypted) dataset for testing, so may give useful stack traces.
Additional Notes
I'm not aware of any specific changes to ARC that are likely to have caused this drastic change in performance, but the fact that setting primarycache=none and secondarycache=none results in such an improvement in system responsiveness makes it clear that the issue is most likely either related to the ARC, or to something it relies upon.
I would assume that if this issue also affected Linux there would have been a lot more issues reported about it, so either Linux is unaffected, or macOS is affected differently (more severely), resulting in a much more noticeable drop in performance.
Many, many thanks to armdn for discovering the workaround for this issue on the forum topic originally created for it. You can view the topic here for many more spindumps and sysctl output.
As pointed out by cgiard, since the issue has occurred since at least the v2.1.6 macOS release, this makes the persistent L2ARC fixes a possible area to look at, though removing L2ARC does not appear to make a difference.
Another thread by ranvel, which you can see here proposes that the issue is write operations causing user space to freeze. My own experience hasn't occurred on write-intensive systems though, so I think the interaction may be more complex.