Skip to content

Conversation

@puffymob
Copy link

@puffymob puffymob commented Oct 30, 2025

SUMMARY

This fix resolves an idempotency failure that occurs when defining zpool vdevs using canonical device IDs (e.g., /dev/disk/by-id/...).

The underlying issue was two-fold:

  1. The base_device parsing logic was insufficient to strip partition suffixes (-partN) from these specific canonical paths.
  2. The concurrent use of the 'real_paths=True' flag in zpool status output complicated path comparison.

This patch implements the solution by:
a) Enhancing the base_device function to correctly normalize /dev/disk/by-id/ paths. b) Removing the redundant 'real_paths=True' flag to stabilize path reporting.

This solution was developed and implemented entirely by the Author of this commit.

Fixes #10771
Fixes #10744
Ref #10146 (comment)

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

zpool

This fix resolves an idempotency failure that occurs when defining zpool vdevs using canonical device IDs (e.g., /dev/disk/by-id/...).

The underlying issue was two-fold:
1. The base_device parsing logic was insufficient to strip partition suffixes (-partN) from these specific canonical paths.
2. The concurrent use of the 'real_paths=True' flag in zpool status output complicated path comparison.

This patch implements the solution by:
a) Enhancing the base_device function to correctly normalize /dev/disk/by-id/ paths.
b) Removing the redundant 'real_paths=True' flag to stabilize path reporting.

This solution was developed and implemented entirely by the Author of this commit.

Signed-off-by: Ivan Skachko <[email protected]>
@ansibullbot
Copy link
Collaborator

@ansibullbot ansibullbot added bug This issue/PR relates to a bug module module new_contributor Help guide this first time contributor plugins plugin (any type) labels Oct 30, 2025
@felixfontein felixfontein added check-before-release PR will be looked at again shortly before release and merged if possible. backport-10 Automatically create a backport for the stable-10 branch backport-11 Automatically create a backport for the stable-10 branch labels Oct 30, 2025
@ansibullbot

This comment was marked as outdated.

@ansibullbot ansibullbot added ci_verified Push fixes to PR branch to re-run CI needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR labels Oct 30, 2025
@felixfontein
Copy link
Collaborator

Thanks for your contribution! Can you please add a changelog fragment? Thanks.

@ansibullbot ansibullbot added ci_verified Push fixes to PR branch to re-run CI and removed ci_verified Push fixes to PR branch to re-run CI labels Oct 31, 2025
@puffymob
Copy link
Author

Thanks for your contribution! Can you please add a changelog fragment? Thanks.

done!

@ansibullbot ansibullbot removed ci_verified Push fixes to PR branch to re-run CI needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR labels Oct 31, 2025

def get_current_layout(self):
with self.zpool_runner('subcommand full_paths real_paths name', check_rc=True) as ctx:
rc, stdout, stderr = ctx.run(subcommand='status', full_paths=True, real_paths=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether this change (and the one further below) could break some uses of the module unrelated to /dev/disk/by-id/, and would require another adjustment to avoid this breakage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been hoping to take some time to look at this a bit more, but I just haven't been able to find any lately, so my comments are based solely on browsing the changes and not testing, but my instincts are that:

  1. There's the potential for /dev/disk/by-*/* to be used for identifying disks, so ideally we'd cover more than just the by-id path.
  2. Everything under /dev/disk/by-*/* is really just a symlink, so I would have initially though that just doing an os.readlink on any devices passed as symlinks might solve the problem. I'm not sure if this truely vibes with how zfs stores device names though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second that. A fix for the underlying problem should aim to support all kinds of /dev/disk/by-* symlinks.

Copy link

@gumbo2k gumbo2k Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixfontein , I thought about that, but I do not think it will cause problems. Not here where we only read the current layout, and not further down.
Rather the opposite.
In both places we call zpool status and extract the devices that make up the pool.

As an admin, when creating a zfs pool on a server with lots of disks, I expect disks to fail over time, to disappear or be replaced, and so on. So when I create the pool, I make sure to use device links that will survive those changes.

Enforcing real_paths in get_current_layout doesn't make sense. I want the same layout reported back, that I put in.
If I used the stable by-id symlinks during creation of the pool, I want them to be reported back.
If I used /dev/sdx, /dev/sdy, /dev/sdz during creation of the pool, I want those to be reported back.

I checked the history of the module, to see if the translation to "real_paths" was added in a commit that would explain the rational, but it seems real_paths=True was in there from the start.

Copy link

@gumbo2k gumbo2k Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been hoping to take some time to look at this a bit more, but I just haven't been able to find any lately, so my comments are based solely on browsing the changes and not testing, but my instincts are that:

There's the potential for /dev/disk/by-*/* to be used for identifying disks, so ideally we'd cover more than just the by-id path.

@dthomson-triumf , @n3ph , I briefly considered extending the regex match to include all /dev/disk/by-*/ directoriies but I intentionally limited the blast radius to avoid unintentional changes in behavior, just in case somebody creates a symlink named something-part123 that is not a link to a partition.

I know more-or-less what to expect in by-id, but directories like by-diskseq, by-dname, by-loop-inode, by-loop-ref, by-partuuid, by-path or by-uuid ? Some are only there for loopback devices and some only get populated by creation of filesystems or partitions. Exactly what base_device() tries to avoid.

Everything under /dev/disk/by-*/* is really just a symlink, so I would have initially though that just doing an os.readlink on any devices passed as symlinks might solve the problem. I'm not sure if this truely vibes with how zfs stores device names though.

I don't have much experience with zfs, but the module does not resolve symlinks when creating a pool, and zpools seems to work nicely with those symlinked devices.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumbo2k IMO the risk is fairly low considering that these device paths are values that would have been entered by a person (or a lookup module or something, but I think the point remains).

However, I think the best solution is what you mentioned in your previous reply to @felixfontein. I was kind of fuzzy on why the device names were different in the zpool from the values that were entered. I didn't realize that it was the zpool module that was trying to change the device name via the real_paths option. Generally, I don't think the Ansible module should be the responsible for what my device names are according to ZFS. That should be ZFS's responsibility.

Copy link

@n3ph n3ph Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, most of these devices are managed by udev and device mapper anyway.

In general, I would leave it up to the user to make sense out of what is going to be configured. Underlying LUN specifics are nothing for an anisble module to assume, but for the user to consider.

Edit: Though, thank you for looking into this issue.. 🙇🏼

@felixfontein
Copy link
Collaborator

Please note that this PR now has a conflict since we reformatted all code in the collection (see #10999).

@ansibullbot ansibullbot added needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR labels Nov 1, 2025
@felixfontein felixfontein removed the backport-11 Automatically create a backport for the stable-10 branch label Nov 2, 2025
@puffymob
Copy link
Author

puffymob commented Nov 5, 2025

Please note that this PR now has a conflict since we reformatted all code in the collection (see #10999).

@felixfontein
hmm per testing the new reformatted zpool.py (current original state, without any disk-by modifications) module it throws error if deployed on XCP-ng 8.3, which has python2.7 and python3.6 by default:

TASK [Create ZFS pools] ****************************************************************************************************************************************************************

failed: [server.domain.com] (item=zfspoolhdd) => {"ansible_loop_var": "item", "changed": false, "item": {"ashift": 9, "disks": ["/dev/disk/by-id/ata-ST2000NX0423_W4622LPM", "/dev/disk/by-id/ata-ST2000NX0423_W4622Q4L", "/dev/disk/by-id/ata-ST2000NX0423_W4622LTN", "/dev/disk/by-id/ata-ST2000NX0423_W4622Q2Q"], "name": "zfspoolhdd", "type": "raidz1"}, "module_stderr": "Shared connection to server.domain.com closed.\r\n", "module_stdout": "/etc/profile.d/lang.sh: line 19: warning: setlocale: LC_CTYPE: cannot change locale (C.UTF-8)\r\nTraceback (most recent call last):\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762354683.2000864-305-196936839075766/AnsiballZ_zpool.py\", line 107, in <module>\r\n    _ansiballz_main()\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762354683.2000864-305-196936839075766/AnsiballZ_zpool.py\", line 99, in _ansiballz_main\r\n    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762354683.2000864-305-196936839075766/AnsiballZ_zpool.py\", line 48, in invoke_module\r\n    run_name='__main__', alter_sys=True)\r\n  File \"/usr/lib64/python2.7/runpy.py\", line 170, in run_module\r\n    mod_name, loader, code, fname = _get_module_details(mod_name)\r\n  File \"/usr/lib64/python2.7/runpy.py\", line 113, in _get_module_details\r\n    code = loader.get_code(mod_name)\r\n  File \"/tmp/ansible_community.general.zpool_payload_HiKwPP/ansible_community.general.zpool_payload.zip/ansible_collections/community/general/plugins/modules/zpool.py\", line 180\r\n    lambda props: sum([[\"-o\", f\"{prop}={value}\"] for prop, value in (props or {}).items()], [])\r\n                                              ^\r\nSyntaxError: invalid syntax\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
failed: [server.domain.com] (item=zfspoolssd) => {"ansible_loop_var": "item", "changed": false, "item": {"ashift": 9, "disks": ["/dev/disk/by-id/scsi-358ce38ee2056d5b5", "/dev/disk/by-id/scsi-358ce38ee2056d5c5", "/dev/disk/by-id/scsi-358ce38ee2056d299"], "name": "zfspoolssd", "type": "raidz1"}, "module_stderr": "Shared connection to server.domain.com closed.\r\n", "module_stdout": "/etc/profile.d/lang.sh: line 19: warning: setlocale: LC_CTYPE: cannot change locale (C.UTF-8)\r\nTraceback (most recent call last):\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762354684.2914195-305-154578504488355/AnsiballZ_zpool.py\", line 107, in <module>\r\n    _ansiballz_main()\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762354684.2914195-305-154578504488355/AnsiballZ_zpool.py\", line 99, in _ansiballz_main\r\n    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762354684.2914195-305-154578504488355/AnsiballZ_zpool.py\", line 48, in invoke_module\r\n    run_name='__main__', alter_sys=True)\r\n  File \"/usr/lib64/python2.7/runpy.py\", line 170, in run_module\r\n    mod_name, loader, code, fname = _get_module_details(mod_name)\r\n  File \"/usr/lib64/python2.7/runpy.py\", line 113, in _get_module_details\r\n    code = loader.get_code(mod_name)\r\n  File \"/tmp/ansible_community.general.zpool_payload_bJhEV8/ansible_community.general.zpool_payload.zip/ansible_collections/community/general/plugins/modules/zpool.py\", line 180\r\n    lambda props: sum([[\"-o\", f\"{prop}={value}\"] for prop, value in (props or {}).items()], [])\r\n                                              ^\r\nSyntaxError: invalid syntax\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}

So the zpool.py before reformatting was:

                pool_properties=cmd_runner_fmt.as_func(
                    lambda props: sum([['-o', '{}={}'.format(prop, value)] for prop, value in (props or {}).items()], [])
                ),
                filesystem_properties=cmd_runner_fmt.as_func(
                    lambda props: sum([['-O', '{}={}'.format(prop, value)] for prop, value in (props or {}).items()], [])
                ),`

and now:

                pool_properties=cmd_runner_fmt.as_func(
                    lambda props: sum([["-o", f"{prop}={value}"] for prop, value in (props or {}).items()], [])
                ),
                filesystem_properties=cmd_runner_fmt.as_func(
                    lambda props: sum([["-O", f"{prop}={value}"] for prop, value in (props or {}).items()], [])
                ),

Forcing playbook to use python3.6 like:

  vars:
    ansible_python_interpreter: /usr/bin/python3.6

results in another error output:

failed: [server.domain.com] (item=zfspoolhdd) => {"ansible_loop_var": "item", "changed": false, "item": {"ashift": 9, "disks": ["/dev/disk/by-id/ata-ST2000NX0423_W4622LPM", "/dev/disk/by-id/ata-ST2000NX0423_W4622Q4L", "/dev/disk/by-id/ata-ST2000NX0423_W4622LTN", "/dev/disk/by-id/ata-ST2000NX0423_W4622Q2Q"], "name": "zfspoolhdd", "type": "raidz1"}, "module_stderr": "Shared connection to server.domain.com closed.\r\n", "module_stdout": "/etc/profile.d/lang.sh: line 19: warning: setlocale: LC_CTYPE: cannot change locale (C.UTF-8)\r\nTraceback (most recent call last):\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762355177.0763855-356-248512208845395/AnsiballZ_zpool.py\", line 107, in <module>\r\n    _ansiballz_main()\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762355177.0763855-356-248512208845395/AnsiballZ_zpool.py\", line 99, in _ansiballz_main\r\n    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762355177.0763855-356-248512208845395/AnsiballZ_zpool.py\", line 48, in invoke_module\r\n    run_name='__main__', alter_sys=True)\r\n  File \"/usr/lib64/python3.6/runpy.py\", line 201, in run_module\r\n    mod_name, mod_spec, code = _get_module_details(mod_name)\r\n  File \"/usr/lib64/python3.6/runpy.py\", line 128, in _get_module_details\r\n    spec = importlib.util.find_spec(mod_name)\r\n  File \"/usr/lib64/python3.6/importlib/util.py\", line 89, in find_spec\r\n    return _find_spec(fullname, parent.__path__)\r\n  File \"<frozen importlib._bootstrap>\", line 894, in _find_spec\r\n  File \"<frozen importlib._bootstrap_external>\", line 1157, in find_spec\r\n  File \"<frozen importlib._bootstrap_external>\", line 1131, in _get_spec\r\n  File \"<frozen importlib._bootstrap_external>\", line 1112, in _legacy_get_spec\r\n  File \"<frozen importlib._bootstrap>\", line 441, in spec_from_loader\r\n  File \"<frozen importlib._bootstrap_external>\", line 544, in spec_from_file_location\r\n  File \"/tmp/ansible_community.general.zpool_payload__gwuzb1q/ansible_community.general.zpool_payload.zip/ansible_collections/community/general/plugins/modules/zpool.py\", line 7\r\nSyntaxError: future feature annotations is not defined\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
failed: [server.domain.com] (item=zfspoolssd) => {"ansible_loop_var": "item", "changed": false, "item": {"ashift": 9, "disks": ["/dev/disk/by-id/scsi-358ce38ee2056d5b5", "/dev/disk/by-id/scsi-358ce38ee2056d5c5", "/dev/disk/by-id/scsi-358ce38ee2056d299"], "name": "zfspoolssd", "type": "raidz1"}, "module_stderr": "Shared connection to server.domain.com closed.\r\n", "module_stdout": "/etc/profile.d/lang.sh: line 19: warning: setlocale: LC_CTYPE: cannot change locale (C.UTF-8)\r\nTraceback (most recent call last):\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762355178.1753685-356-141685137266919/AnsiballZ_zpool.py\", line 107, in <module>\r\n    _ansiballz_main()\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762355178.1753685-356-141685137266919/AnsiballZ_zpool.py\", line 99, in _ansiballz_main\r\n    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n  File \"/root/.ansible/tmp/ansible-tmp-1762355178.1753685-356-141685137266919/AnsiballZ_zpool.py\", line 48, in invoke_module\r\n    run_name='__main__', alter_sys=True)\r\n  File \"/usr/lib64/python3.6/runpy.py\", line 201, in run_module\r\n    mod_name, mod_spec, code = _get_module_details(mod_name)\r\n  File \"/usr/lib64/python3.6/runpy.py\", line 128, in _get_module_details\r\n    spec = importlib.util.find_spec(mod_name)\r\n  File \"/usr/lib64/python3.6/importlib/util.py\", line 89, in find_spec\r\n    return _find_spec(fullname, parent.__path__)\r\n  File \"<frozen importlib._bootstrap>\", line 894, in _find_spec\r\n  File \"<frozen importlib._bootstrap_external>\", line 1157, in find_spec\r\n  File \"<frozen importlib._bootstrap_external>\", line 1131, in _get_spec\r\n  File \"<frozen importlib._bootstrap_external>\", line 1112, in _legacy_get_spec\r\n  File \"<frozen importlib._bootstrap>\", line 441, in spec_from_loader\r\n  File \"<frozen importlib._bootstrap_external>\", line 544, in spec_from_file_location\r\n  File \"/tmp/ansible_community.general.zpool_payload_oweiv8jn/ansible_community.general.zpool_payload.zip/ansible_collections/community/general/plugins/modules/zpool.py\", line 7\r\nSyntaxError: future feature annotations is not defined\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}

shortly:

SyntaxError: future feature annotations is not defined

seems like a minimum python 3.7 is required for this module to work properly

@felixfontein
Copy link
Collaborator

hmm per testing the new reformatted zpool.py (current original state, without any disk-by modifications) module it throws error if deployed on XCP-ng 8.3, which has python2.7 and python3.6 by default:

The collection only supports Python 3.7+ since version 12.0.0, see the changelog. community.general 11.x.y was the last version to support Python 3.6 and 2.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-10 Automatically create a backport for the stable-10 branch bug This issue/PR relates to a bug check-before-release PR will be looked at again shortly before release and merged if possible. has_issue module module needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR new_contributor Help guide this first time contributor plugins plugin (any type)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

zpool: state=present is not idempotent zpool: vdev disks do not support symlinks

6 participants