-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix(zpool): Ensure consistent device path normalization for idempotency #11020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix(zpool): Ensure consistent device path normalization for idempotency #11020
Conversation
This fix resolves an idempotency failure that occurs when defining zpool vdevs using canonical device IDs (e.g., /dev/disk/by-id/...). The underlying issue was two-fold: 1. The base_device parsing logic was insufficient to strip partition suffixes (-partN) from these specific canonical paths. 2. The concurrent use of the 'real_paths=True' flag in zpool status output complicated path comparison. This patch implements the solution by: a) Enhancing the base_device function to correctly normalize /dev/disk/by-id/ paths. b) Removing the redundant 'real_paths=True' flag to stabilize path reporting. This solution was developed and implemented entirely by the Author of this commit. Signed-off-by: Ivan Skachko <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
|
Thanks for your contribution! Can you please add a changelog fragment? Thanks. |
done! |
|
|
||
| def get_current_layout(self): | ||
| with self.zpool_runner('subcommand full_paths real_paths name', check_rc=True) as ctx: | ||
| rc, stdout, stderr = ctx.run(subcommand='status', full_paths=True, real_paths=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether this change (and the one further below) could break some uses of the module unrelated to /dev/disk/by-id/, and would require another adjustment to avoid this breakage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been hoping to take some time to look at this a bit more, but I just haven't been able to find any lately, so my comments are based solely on browsing the changes and not testing, but my instincts are that:
- There's the potential for
/dev/disk/by-*/*to be used for identifying disks, so ideally we'd cover more than just theby-idpath. - Everything under
/dev/disk/by-*/*is really just a symlink, so I would have initially though that just doing anos.readlinkon any devices passed as symlinks might solve the problem. I'm not sure if this truely vibes with how zfs stores device names though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second that. A fix for the underlying problem should aim to support all kinds of /dev/disk/by-* symlinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixfontein , I thought about that, but I do not think it will cause problems. Not here where we only read the current layout, and not further down.
Rather the opposite.
In both places we call zpool status and extract the devices that make up the pool.
As an admin, when creating a zfs pool on a server with lots of disks, I expect disks to fail over time, to disappear or be replaced, and so on. So when I create the pool, I make sure to use device links that will survive those changes.
Enforcing real_paths in get_current_layout doesn't make sense. I want the same layout reported back, that I put in.
If I used the stable by-id symlinks during creation of the pool, I want them to be reported back.
If I used /dev/sdx, /dev/sdy, /dev/sdz during creation of the pool, I want those to be reported back.
I checked the history of the module, to see if the translation to "real_paths" was added in a commit that would explain the rational, but it seems real_paths=True was in there from the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been hoping to take some time to look at this a bit more, but I just haven't been able to find any lately, so my comments are based solely on browsing the changes and not testing, but my instincts are that:
There's the potential for
/dev/disk/by-*/*to be used for identifying disks, so ideally we'd cover more than just theby-idpath.
@dthomson-triumf , @n3ph , I briefly considered extending the regex match to include all /dev/disk/by-*/ directoriies but I intentionally limited the blast radius to avoid unintentional changes in behavior, just in case somebody creates a symlink named something-part123 that is not a link to a partition.
I know more-or-less what to expect in by-id, but directories like by-diskseq, by-dname, by-loop-inode, by-loop-ref, by-partuuid, by-path or by-uuid ? Some are only there for loopback devices and some only get populated by creation of filesystems or partitions. Exactly what base_device() tries to avoid.
Everything under
/dev/disk/by-*/*is really just a symlink, so I would have initially though that just doing anos.readlinkon any devices passed as symlinks might solve the problem. I'm not sure if this truely vibes with how zfs stores device names though.
I don't have much experience with zfs, but the module does not resolve symlinks when creating a pool, and zpools seems to work nicely with those symlinked devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gumbo2k IMO the risk is fairly low considering that these device paths are values that would have been entered by a person (or a lookup module or something, but I think the point remains).
However, I think the best solution is what you mentioned in your previous reply to @felixfontein. I was kind of fuzzy on why the device names were different in the zpool from the values that were entered. I didn't realize that it was the zpool module that was trying to change the device name via the real_paths option. Generally, I don't think the Ansible module should be the responsible for what my device names are according to ZFS. That should be ZFS's responsibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, most of these devices are managed by udev and device mapper anyway.
In general, I would leave it up to the user to make sense out of what is going to be configured. Underlying LUN specifics are nothing for an anisble module to assume, but for the user to consider.
Edit: Though, thank you for looking into this issue.. 🙇🏼
Co-authored-by: Felix Fontein <[email protected]>
|
Please note that this PR now has a conflict since we reformatted all code in the collection (see #10999). |
@felixfontein So the zpool.py before reformatting was: and now: Forcing playbook to use python3.6 like: results in another error output: shortly: seems like a minimum python 3.7 is required for this module to work properly |
The collection only supports Python 3.7+ since version 12.0.0, see the changelog. community.general 11.x.y was the last version to support Python 3.6 and 2.7. |
SUMMARY
This fix resolves an idempotency failure that occurs when defining zpool vdevs using canonical device IDs (e.g., /dev/disk/by-id/...).
The underlying issue was two-fold:
This patch implements the solution by:
a) Enhancing the base_device function to correctly normalize /dev/disk/by-id/ paths. b) Removing the redundant 'real_paths=True' flag to stabilize path reporting.
This solution was developed and implemented entirely by the Author of this commit.
Fixes #10771
Fixes #10744
Ref #10146 (comment)
ISSUE TYPE
COMPONENT NAME
zpool