Skip to content

Remove workaround for old Linux kernel CIFS bug #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: bitcoin-fork
Choose a base branch
from

Conversation

laanwj
Copy link
Member

@laanwj laanwj commented Apr 29, 2025

A long time ago (bitcoin/bitcoin#10000) we added a workaround for a Linux kernel issue where calling fsync on a directory would instantly and reliably fail for CIFS mounts. In 2018 i ported this forward to the new leveldb tree (d42e63d) without checking if it would still be necessary. Nearly a decade later it's time to reevaluate this.

This is an ugly workaround (always intended to be temporary) as the code isn't written specifically: If directory sync fails on a correctly working filesystem it ignores this as well, potentially ignoring a real problem. Although checking for errno==EINVAL partially mitigates this, it would be preferable to get rid of it. See also google#1262 .

So just today i did some experiment to check:

// Test case for directory synchronization on CIFS share.
// SPDX-License-Identifier: MIT
//
// Usage: fsynctest <cifs_mount_path>/testdir
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    if (argc != 2) {
        fprintf(stderr, "missing arg\n");
        exit(1);
    }
    int fd = open(argv[1], O_DIRECTORY | O_RDONLY); // O_DIRECTORY just ensures that this can't be run on files.
    if (fd < 0) {
        perror("open");
        exit(1);
    }

    int res;

    fprintf(stderr, "performing fsync on directory... ");
    res = fsync(fd);
    if (res == 0) {
        fprintf(stderr, "ok\n");
    } else {
        perror("");
    }

    fprintf(stderr, "performing fdatasync on directory... ");
    res = fdatasync(fd);
    if (res == 0) {
        fprintf(stderr, "ok\n");
    } else {
        perror("");
    }

    close(fd);
}

and ran this on a CIFS share:

$ findmnt /storage/synctest
TARGET         SOURCE          FSTYPE OPTIONS
/storage/synctest //storage/... cifs   rw,relatime,vers=3.1.1,...
$ ./fsynctest /storage/synctest/synctest/
performing fsync on directory... ok
performing fdatasync on directory... ok

At least at first glance (some more testing might or might not be warranted), the original problem doesn't appear to exist anymore.

Versions:

  • client: kernel 6.8.0, smbd 4.19.5
  • server: kernel 6.6.31, smbd 4.17.12

This reverts commit d42e63d.

@l0rinc
Copy link

l0rinc commented Apr 29, 2025

Concept ACK, thanks for taking care of this - I'm all for minimizing the differences with upstream!

@laanwj
Copy link
Member Author

laanwj commented Apr 29, 2025

Ping @NicolasDorier; is this still relevant to you? have you run into this issue recently? Can you test this PR on your current setup?

FWIW i've synced the entire testnet4 chain to a CIFS share with the patch reverted, no problems.

@l0rinc
Copy link

l0rinc commented Apr 29, 2025

Is this related to torvalds/linux@6e70c26?

@laanwj
Copy link
Member Author

laanwj commented Apr 29, 2025

Is this related to torvalds/linux@6e70c26?

That does look like it, thanks. The first release that made it into seems to be 4.17.0, released on June, 3rd 2018.

$ git tag --contains 6e70c267e68d77679534dcf4aaf84e66f2cf1425|head -1
v4.17

So it's been fixed for a loong time.

Copy link

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I recommend taking a look at the Env implementation in Chromium: they have retries with exponential back-off on syscalls. Implementing this helped me when I deployed LevelDB on a mobile app that runs on devices with faulty storage media.

@davidgumberg
Copy link

ACK 53df05c

I think this was a bug in the kernel, and filesystems which don't support fsync() should never return EINVAL, we could also probably remove the EINVAL exceptions that were carved into the fs helper functions in bitcoin/bitcoin@cf02779

So it's been fixed for a loong time.

There is still a "supported" version of the kernel that lacks this patch, the 4.4 CIP SLTS kernel which will be maintained by the CIP project until 2027, although I'm not sure whether supporting CIP kernels is in-scope for Bitcoin Core, if it is, it would probably be best to submit a backport patch to the CIP mailing list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants