Skip to content

Conversation

Copy link

Copilot AI commented Jan 12, 2026

Addresses three core ZFS bugs: RAID-Z expansion progress exceeding 100%, hot spares failing to activate for UNAVAIL devices, and scrub method reverting after pool export/import.

Changes

RAID-Z Expansion Progress Calculation

Progress could exceed 100% when new data was written during expansion. pres_to_reflow was calculated from vd->vdev_stat.vs_alloc which grows dynamically.

  • Added vre_bytes_to_copy field to vdev_raidz_expand_t to capture initial allocation at expansion start
  • Persisted value via new VDEV_TOP_ZAP_RAIDZ_EXPAND_BYTES_TO_COPY key
  • Modified spa_raidz_expand_get_stats() to use fixed value, falling back to vs_alloc for backward compatibility
// Before: dynamic value can increase during expansion
pres->pres_to_reflow = vd->vdev_stat.vs_alloc;

// After: use captured value
pres->pres_to_reflow = vre->vre_bytes_to_copy != 0 ?
    vre->vre_bytes_to_copy : vd->vdev_stat.vs_alloc;

Hot Spare Auto-Activation

ZED retire agent only triggered replacement for VDEV_STATE_REMOVED and VDEV_STATE_FAULTED, missing VDEV_STATE_CANT_OPEN (UNAVAIL state).

  • Extended state check in zfs_retire_recv() to include VDEV_STATE_CANT_OPEN

Scrub Method Persistence

scn_is_sorted flag was not persisted in dsl_scan_phys_t, causing sequential scans to revert to legacy after pool reimport.

  • Added DSF_SORTED_SCAN flag to dsl_scan_flags_t
  • Set flag in scn_phys.scn_flags when enabling sorted scan
  • Restore scn_is_sorted from flag during dsl_scan_init()
  • Clear flag when scan completes or reverts to unsorted

Not Addressed

Project quota initialization after pool upgrade requires traversing all open objsets, which is too invasive. Existing code already handles upgrade on mount—users must remount datasets after enabling the feature.

Original prompt

This section details on the original issue you should resolve

<issue_title>AI Rollup - Bugs in Core ZFS Logic and Features</issue_title>
<issue_description>Auto-Generated Meta Issue

Severity: High
Type: Bug
Auto-fix PR-Candidate: true
Votes: 2
Related OpenZFS Issues:
openzfs#18103,
openzfs#17999,
openzfs#17955,
openzfs#18086,
openzfs#17953,
openzfs#17887

Analysis:
Several bugs have been identified in the core logic of ZFS features. These include RAID-Z expansion getting stuck with progress reported over 100%; project quotas not being usable after a pool upgrade from an older version until datasets are remounted; hot spares failing to activate automatically when a device is marked UNAVAIL; and the scrub method changing from legacy to sequential after a pool export/import cycle.

Reproduction Steps:
Reproduction steps are clear for most issues. The hot spare issue can be reproduced by creating a pool with a spare, powering down, unplugging a drive, and booting back up. The scrub issue can be reproduced by starting a legacy scrub, exporting the pool, changing the global setting to sequential, and re-importing.

Sugested Action:
Fix the progress calculation for RAID-Z expansion. For project quotas, automatically re-initialize objsets when the feature is enabled via zpool upgrade. For hot spares, investigate why the zed daemon or kernel event handler is not triggering the replacement. For the scrub issue, store the scan method in the checkpoint data to ensure it resumes consistently.

Debug Info:
</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits January 12, 2026 05:23
… persistence

- Fix RAID-Z expansion progress calculation exceeding 100% by capturing and
  persisting initial bytes_to_copy value at expansion start
- Fix hot spare not activating for UNAVAIL (VDEV_STATE_CANT_OPEN) devices
- Fix scrub method changing from legacy to sequential after pool export/import
  by persisting scn_is_sorted flag in DSF_SORTED_SCAN

Co-authored-by: alek-p <[email protected]>
- Add backward compatibility note for vre_bytes_to_copy field
- Fix DSL_SCAN_FLAGS_MASK to include all flags including DSF_SCRUB_PAUSED

Co-authored-by: alek-p <[email protected]>
Copilot AI changed the title [WIP] Fix bugs in core ZFS logic and features Fix RAID-Z expansion progress overflow, hot spare UNAVAIL handling, and scrub method persistence Jan 12, 2026
Copilot AI requested a review from alek-p January 12, 2026 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AI Rollup - Bugs in Core ZFS Logic and Features

2 participants