Skip to content

ARC causing slow system performance on any version higher than v2.1.0 (v2.1.6+) #127

@Haravikk

Description

@Haravikk

System information

Type Version/Name
Distribution Name macOS
Distribution Version 10.15.7
Linux Kernel Darwin Kernel Version 19.6.0
Architecture x86_64
ZFS Version zfs-macOS-2.2.3-rc4

Describe the problem you're observing

Updating to any macOS ZFS version above v2.1.0 results in extremely poor system performance while datasets are mounted and in use.

The issue appears to be with ARC, as setting primarycache=none and secondarycache=none significantly improves system performance/responsiveness, minus the cost of loading more data from disk.

Describe how to reproduce the problem

  1. Install v2.1.0 on a known affected system (2018 Mac Mini, 2008 Mac Pro, 2010 Mac Pro running Catalina are confirmed)
  2. Create/mount datasets and verify performance is as expected. Encryption, compression etc. should all perform within reasonable margins.
  3. Upgrade to v2.1.6 or later (including v2.2.3rc4).
  4. Observe performance is significantly worse.
  5. Run the commands zfs set primarycache=none <pool> and zfs set secondarycache=none <zpool> (may need to set for additional datasets if values are not inherited).
  6. After a short delay, performance should significantly improve (a restart may be required).
  7. Revert to v2.1.0 and set primarycache and secondarycache back to previous values (default is =all).
  8. Performance should be much better with ARC functioning as normal.

Attachments

The following spindumps were all generated under v2.2.3rc4 with ARC configured as normal (in use), causing many programs to run extremely slowly as they spend large amounts of time waiting for data.

spindumps-v2.2.3rc4.zip

Unfortunately due to the slow system responsiveness it was difficult to generate spindumps at the moments of worst performance, though I tried. Of note, spindump.6.txt was taken while attempting to decrypt a dataset into a new (unencrypted) dataset for testing, so may give useful stack traces.

Additional Notes

I'm not aware of any specific changes to ARC that are likely to have caused this drastic change in performance, but the fact that setting primarycache=none and secondarycache=none results in such an improvement in system responsiveness makes it clear that the issue is most likely either related to the ARC, or to something it relies upon.

I would assume that if this issue also affected Linux there would have been a lot more issues reported about it, so either Linux is unaffected, or macOS is affected differently (more severely), resulting in a much more noticeable drop in performance.

Many, many thanks to armdn for discovering the workaround for this issue on the forum topic originally created for it. You can view the topic here for many more spindumps and sysctl output.

As pointed out by cgiard, since the issue has occurred since at least the v2.1.6 macOS release, this makes the persistent L2ARC fixes a possible area to look at, though removing L2ARC does not appear to make a difference.

Another thread by ranvel, which you can see here proposes that the issue is write operations causing user space to freeze. My own experience hasn't occurred on write-intensive systems though, so I think the interaction may be more complex.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions