Skip to content

Fix FS storage races#34452

Open
stanislav-shchetinin wants to merge 1 commit intoydb-platform:mainfrom
stanislav-shchetinin:bugs-in-nfs-export
Open

Fix FS storage races#34452
stanislav-shchetinin wants to merge 1 commit intoydb-platform:mainfrom
stanislav-shchetinin:bugs-in-nfs-export

Conversation

@stanislav-shchetinin
Copy link
Collaborator

Changelog entry

...

Changelog category

  • Not for changelog (changelog entry is not required)

Description for reviewers

  • убрал гонку в PutObject (второй поток мог перетереть файл пока в него писал первый);
  • добавил fsync для директорий.

Copilot AI review requested due to automatic review settings February 18, 2026 18:16
@stanislav-shchetinin stanislav-shchetinin self-assigned this Feb 18, 2026
@stanislav-shchetinin stanislav-shchetinin linked an issue Feb 18, 2026 that may be closed by this pull request
@github-actions
Copy link

github-actions bot commented Feb 18, 2026

2026-02-18 18:18:14 UTC Pre-commit check linux-x86_64-release-asan for 7d68433 has started.
2026-02-18 18:18:33 UTC Artifacts will be uploaded here
2026-02-18 18:20:44 UTC ya make is running...
🟡 2026-02-18 19:44:57 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
17120 17082 0 21 7 10

🟢 2026-02-18 19:45:09 UTC Build successful.
🟢 2026-02-18 19:45:40 UTC ydbd size 3.9 GiB changed* by +7.7 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: dfec5e2 merge: 7d68433 diff diff %
ydbd size 4 188 796 920 Bytes 4 188 804 768 Bytes +7.7 KiB +0.000%
ydbd stripped size 1 567 485 568 Bytes 1 567 488 000 Bytes +2.4 KiB +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Feb 18, 2026

2026-02-18 18:18:22 UTC Pre-commit check linux-x86_64-relwithdebinfo for 7d68433 has started.
2026-02-18 18:18:40 UTC Artifacts will be uploaded here
2026-02-18 18:21:01 UTC ya make is running...
🟡 2026-02-18 20:22:56 UTC Some tests failed, follow the links below. Going to retry failed tests...

Details

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
46543 43031 0 4 3494 14

2026-02-18 20:23:19 UTC ya make is running... (failed tests rerun, try 2)
🟢 2026-02-18 20:25:47 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
55 (only retried tests) 55 0 0 0 0

🟢 2026-02-18 20:25:54 UTC Build successful.
🟢 2026-02-18 20:26:18 UTC ydbd size 2.4 GiB changed* by +4.6 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: dfec5e2 merge: 7d68433 diff diff %
ydbd size 2 572 231 544 Bytes 2 572 236 208 Bytes +4.6 KiB +0.000%
ydbd stripped size 542 282 952 Bytes 542 283 784 Bytes +832 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@ydbot
Copy link
Collaborator

ydbot commented Feb 18, 2026

Run Extra Tests

Run additional tests for this PR. You can customize:

  • Test Size: small, medium, large (default: all)
  • Test Targets: any directory path (default: ydb/)
  • Sanitizers: ASAN, MSAN, TSAN
  • Coredumps: enable for debugging (default: off)
  • Additional args: custom ya make arguments

▶  Run tests

@github-actions
Copy link

🟢 2026-02-18 18:21:34 UTC The validation of the Pull Request description is successful.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to eliminate file-system storage race conditions in the FS external storage wrapper, improving durability guarantees for object writes.

Changes:

  • Reworked PutObject to avoid a truncate/write race by reusing the multipart session locking/truncation pattern.
  • Added directory fsync (via Flush() on the parent directory fd) after object creation and after multipart completion rename.
  • Improved logging for failed cleanup during AbortMultipartUpload.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 561 to +562
ActiveUploads.erase(it);
session.File.Close();
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session is taken by reference (auto& session = it->second) and then ActiveUploads.erase(it) is called before session.File.Close(). Erasing the map element destroys the session, so the subsequent session.File.Close() is a use-after-free/UB. Close the file (and finish any other session access) before erasing, or move the TFile out before erase.

Suggested change
ActiveUploads.erase(it);
session.File.Close();
session.File.Close();
ActiveUploads.erase(it);

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +65
static void FsyncParentDir(const TString& filePath) {
TFsPath parent = TFsPath(filePath).Parent();
TFile dirFd(parent.GetPath(), RdOnly | Seq);
dirFd.Flush();
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FsyncParentDir opens a directory with TFile(parent.GetPath(), RdOnly | Seq) and calls Flush() (fsync). On Windows, TFileHandle uses CreateFileW without FILE_FLAG_BACKUP_SEMANTICS, so opening a directory path will fail and make Put/Complete operations error out. Consider guarding this code for non-Windows platforms or implementing directory-open with the required Windows flags (or using a dedicated helper in util).

Suggested change
static void FsyncParentDir(const TString& filePath) {
TFsPath parent = TFsPath(filePath).Parent();
TFile dirFd(parent.GetPath(), RdOnly | Seq);
dirFd.Flush();
static void FsyncParentDir(const TString& filePath) {
#if defined(_win_)
(void)filePath;
#else
TFsPath parent = TFsPath(filePath).Parent();
TFile dirFd(parent.GetPath(), RdOnly | Seq);
dirFd.Flush();
#endif

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integration testing export to NFS

2 participants

Comments