Skip to content

File incorrectly zeroed when receiving incremental snapshot with -L flag from a 512k recordsize dataset #17866

@oa2215

Description

@oa2215

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version
Kernel Version 14.3-RELEASE-p3
Architecture x86_64
OpenZFS Version zfs-2.2.7-FreeBSD_ge269af1b3

Describe the problem you're observing

The datasets A and B were created with the default 128k recordsize and have been in use for about 8 years. They have been used as primary and DR replication pairs and the replication direction has changed over time.

A few months ago, the recordsize parameter on A was changed to 512k while it has been replicating to B without the -L flag. The recordsize on B was not changed as it was the read-only replica. Recently, -L and -c flags were added when creating the incremental snapshot from the A side. After receiving one or two replications with -L and -c flag on B, we noticed that at least one file got zeroed with stat showing 512k as st_blksize, 1 as st_blocks and correct file size. This file was created with recordsize 512k and replicated to B months ago. We were able to reproduce the issue on A and B by rolling back to a good snapshot and receiving incremental snapshots with -L flag. After confirming this and reproducing using fresh datasets created on the B's pool as detailed below, we rolled back the dataset B to a good snapshot and removed the -L flag from the subsequent incremental snapshots.

I am aware of #6224 and to the best of my knowledge, the -no-L to -L toggle bug has been addressed while the -L to -no-L bug is prohibited if a large block receive has happened. However, I am not certain that the -no-L to -L bug has fully been addressed considering what we observed.

Describe how to reproduce the problem

We were able to reproduce this issue using two newly created datasets using the following steps at the time we experienced the issue:

  1. Create a new dataset (foo) with recordsize=128k.
  2. Create file1 on foo using dd if=/dev/urandom bs=128k count=100.
  3. Take a snapshot and replicate to another new dataset (bar).
  4. Set recordsize to 512k on foo.
  5. Create file2 on foo using the same steps from above.
  6. Take a snapshot and send incremental snapshot to bar without -L flag.
  7. Create file3 on foo using the same steps from above.
  8. Take a snapshot and send incremental snapshot to bar with -L flag.
    At this point, we see that file2 on bar got zeroed on bar.

However, we are no longer able to reproduce the issue with the steps above.

Include any warning/errors/backtraces from the system logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions