Skip to content

btrfs-progs: check that device byte values in superblock match those in chunk root #991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

maharmstone
Copy link
Contributor

The superblock of each device contains a copy of the corresponding struct btrfs_dev_item that lives in the chunk root.

Add a check that the total_bytes and bytes_used values of these two copies match.

@adam900710
Copy link
Collaborator

adam900710 commented May 29, 2025

The CI failure is caused by an old image which is created by some older kernel.

I have updated the image with this PR: #992

Which should solve the test failure for fsck/020.

But there are still some more problems need to be solved:

  • fsck/047
    This requires proper repair support, your enhanced check didn't re-read the updated superblock after repair, thus failling that test case.
    This need some updates on your patch.

  • fsck/057
    The check is not handling seed device correctly.
    The sprout fs can modify the old chunks on the seed device, e.g. remove the old empty SYSTEM chunk.
    But since the seed device is completely read-only, the device item can not be updated, thus causing check error.

    The patch should not check a seed device against the sprouted fs.

@maharmstone
Copy link
Contributor Author

Cheers Qu, I'll have a look

intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request May 29, 2025
Each superblock contains a copy of the device item for that device. In a
transaction which drops a chunk but doesn't create any new ones, we were
correctly updating the device item in the chunk tree but not copying
over the new bytes_used value to the superblock.

This can be seen by doing the following:

 # cd
 # dd if=/dev/zero of=test bs=4096 count=2621440
 # mkfs.btrfs test
 # mount test /root/temp

 # cd /root/temp
 # for i in {00..10}; do dd if=/dev/zero of=$i bs=4096 count=32768; done
 # sync
 # rm *
 # sync
 # btrfs balance start -dusage=0 .
 # sync

 # cd
 # umount /root/temp
 # btrfs check test

(For btrfs-check to detect this, you will also need my patch at
kdave/btrfs-progs#991.)

Change btrfs_remove_dev_extents() so that it adds the devices to the
post_commit_list if they're not there already. This causes
btrfs_commit_device_sizes() to be called, which updates the bytes_used
value in the superblock.

Signed-off-by: Mark Harmstone <[email protected]>
@kdave kdave force-pushed the devel branch 6 times, most recently from ef43ce6 to 3eff852 Compare May 30, 2025 14:14
kdave pushed a commit to btrfs/linux that referenced this pull request Jun 4, 2025
Each superblock contains a copy of the device item for that device. In a
transaction which drops a chunk but doesn't create any new ones, we were
correctly updating the device item in the chunk tree but not copying
over the new bytes_used value to the superblock.

This can be seen by doing the following:

  # dd if=/dev/zero of=test bs=4096 count=2621440
  # mkfs.btrfs test
  # mount test /root/temp

  # cd /root/temp
  # for i in {00..10}; do dd if=/dev/zero of=$i bs=4096 count=32768; done
  # sync
  # rm *
  # sync
  # btrfs balance start -dusage=0 .
  # sync

  # cd
  # umount /root/temp
  # btrfs check test

For btrfs-check to detect this, you will also need my patch at
kdave/btrfs-progs#991.

Change btrfs_remove_dev_extents() so that it adds the devices to the
fs_info->post_commit_list if they're not there already. This causes
btrfs_commit_device_sizes() to be called, which updates the bytes_used
value in the superblock.

Fixes: bbbf724 ("btrfs: combine device update operations during transaction commit")
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Mark Harmstone <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
@maharmstone maharmstone force-pushed the check-superblock-devitem branch from d93a7ed to 0bdf0aa Compare June 6, 2025 14:16
…in chunk root

The superblock of each device contains a copy of the corresponding
struct btrfs_dev_item that lives in the chunk root.

Add a check that the total_bytes and bytes_used values of these two
copies match.

Signed-off-by: Mark Harmstone <[email protected]>
@maharmstone maharmstone force-pushed the check-superblock-devitem branch from 0bdf0aa to 1164c01 Compare June 6, 2025 14:19
@maharmstone
Copy link
Contributor Author

maharmstone commented Jun 6, 2025

But there are still some more problems need to be solved:

  • fsck/047
    This requires proper repair support, your enhanced check didn't re-read the updated superblock after repair, thus failling that test case.
    This need some updates on your patch.

This wasn't quite right. The problem was that the dev_rec->byte_used value wasn't getting updated in check_device_used(). I've force-pushed the patch with this change squashed in.

  • fsck/057
    The check is not handling seed device correctly.
    The sprout fs can modify the old chunks on the seed device, e.g. remove the old empty SYSTEM chunk.
    But since the seed device is completely read-only, the device item can not be updated, thus causing check error.
    The patch should not check a seed device against the sprouted fs.

Unfortunately this wasn't right either. btrfs-check only works on one device at a time, so if the superblock is readonly on one device the metadata on it will be too.
The problem was that this test does a mount, and it was the kernel causing the corruption. Cherry-picking 9516bae0d79045004f0b64b1f852d177cacee094 causes the problem to go away.

kdave pushed a commit to btrfs/linux that referenced this pull request Jun 6, 2025
Each superblock contains a copy of the device item for that device. In a
transaction which drops a chunk but doesn't create any new ones, we were
correctly updating the device item in the chunk tree but not copying
over the new bytes_used value to the superblock.

This can be seen by doing the following:

  # dd if=/dev/zero of=test bs=4096 count=2621440
  # mkfs.btrfs test
  # mount test /root/temp

  # cd /root/temp
  # for i in {00..10}; do dd if=/dev/zero of=$i bs=4096 count=32768; done
  # sync
  # rm *
  # sync
  # btrfs balance start -dusage=0 .
  # sync

  # cd
  # umount /root/temp
  # btrfs check test

For btrfs-check to detect this, you will also need my patch at
kdave/btrfs-progs#991.

Change btrfs_remove_dev_extents() so that it adds the devices to the
fs_info->post_commit_list if they're not there already. This causes
btrfs_commit_device_sizes() to be called, which updates the bytes_used
value in the superblock.

Fixes: bbbf724 ("btrfs: combine device update operations during transaction commit")
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Mark Harmstone <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
@adam900710
Copy link
Collaborator

Thanks a lot for updating the analyze, my quick guess is as bad as usual.

We can merge the series when the upstream kernel got the fix.
Although the CI may be problematic for a while until the CI kernel got the fix backported.

@maharmstone
Copy link
Contributor Author

No worries, thanks Qu

kdave pushed a commit to btrfs/linux that referenced this pull request Jun 9, 2025
Each superblock contains a copy of the device item for that device. In a
transaction which drops a chunk but doesn't create any new ones, we were
correctly updating the device item in the chunk tree but not copying
over the new bytes_used value to the superblock.

This can be seen by doing the following:

  # dd if=/dev/zero of=test bs=4096 count=2621440
  # mkfs.btrfs test
  # mount test /root/temp

  # cd /root/temp
  # for i in {00..10}; do dd if=/dev/zero of=$i bs=4096 count=32768; done
  # sync
  # rm *
  # sync
  # btrfs balance start -dusage=0 .
  # sync

  # cd
  # umount /root/temp
  # btrfs check test

For btrfs-check to detect this, you will also need my patch at
kdave/btrfs-progs#991.

Change btrfs_remove_dev_extents() so that it adds the devices to the
fs_info->post_commit_list if they're not there already. This causes
btrfs_commit_device_sizes() to be called, which updates the bytes_used
value in the superblock.

Fixes: bbbf724 ("btrfs: combine device update operations during transaction commit")
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Mark Harmstone <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants