Skip to content

More aggressively assert that db_mtx protects db.db_data #17209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

asomers
Copy link
Contributor

@asomers asomers commented Apr 2, 2025

db.db_mtx must be held any time that db.db_data is accessed. All of these functions do have the lock held by a parent; add assertions to ensure that it stays that way.

Also, refactor dbuf_read_bonus to make it obvious why db_rwlock isn't required.

See #17118

Sponsored by: ConnectWise
Signed-off-by: Alan Somers [email protected]

Motivation and Context

Bug #16626 probably results from improper locking around db.db_data. Any time that variable is accessed, db_mtx must be held.

Description

As a first step to fixing that bug, add some more assertions in places that already do have the lock, but don't currently assert that they do. This will make the problem more clear, and prevent regressions in those functions.

How Has This Been Tested?

I ran the ZFS test suite in FreeBSD 15.0-CURRENT.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

db.db_mtx must be held any time that db.db_data is accessed.  All of
these functions do have the lock held by a parent; add assertions to
ensure that it stays that way.

Also, refactor dbuf_read_bonus to make it obvious why db_rwlock isn't
required.

See openzfs#17118

Sponsored by:	ConnectWise
Signed-off-by:	Alan Somers <[email protected]>
@asomers
Copy link
Contributor Author

asomers commented Apr 2, 2025

@pcd1193182 could you review, please?

@snajpa
Copy link
Contributor

snajpa commented Apr 6, 2025

From one of the test runs:

   [ 3615.299691] VERIFY(MUTEX_HELD(&db->db_mtx)) failed
  [ 3615.300875] PANIC at dmu.c:1730:dmu_object_cached_size()

it would seem that dmu_object_cached_size is called without db_mtx

@asomers
Copy link
Contributor Author

asomers commented Apr 7, 2025

it would seem that dmu_object_cached_size is called without db_mtx

It looks like that's a Linux-only panic, which is why I didn't see it . I'll investigate.

It's an error that nothing locks the mutex here.  I'll fix it in my next
PR.

Signed-off-by:	Alan Somers <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants