Skip to content

test libuv with posix delete on windows #53394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

IanButterworth
Copy link
Member

Testing libuv/libuv#4318
i.e. to fix #34700

@IanButterworth
Copy link
Member Author

I believe most/all of the rm error logs on Windows CI remain here, but those all appear to be related to deleting temp depot compiled dirs which AFAIU fails on windowswhen a pkgimage stored in the dir has been loaded? @vchuravy

@vchuravy
Copy link
Member

dirs which AFAIU fails on windowswhen a pkgimage stored in the dir has been loaded?

Hm, I was hopeing that would be fixed by it. Does this also apply posix rm to directories?

@IanButterworth
Copy link
Member Author

IanButterworth commented Feb 20, 2024

No I believe that would require modifying fs__rmdir which this doesn't touch?

Disregard, I was unaware that we only call uv_fs_rmdir which calls _wrmdir (delete an empty dir) after we recursively rm all the contents.

@vtjnash perhaps this is falling back to the old delete method? We could check by making the fallback error or print?

@vtjnash
Copy link
Member

vtjnash commented Feb 20, 2024

_wrmdir is a thin wrapper for https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-removedirectoryw (note the comment about posix semantics there)

You could also probably very easily fix rmdir's implementation to call fs__unlink now instead, while adding a boolean parameter to that internal function for whether this is unlink or rmdir, and update this condition here, which selects whether unlink is permitted to remove a directory or a file:
https://github.com/libuv/libuv/blob/76f7fb26b9ac8ede4406fa786acb7480d9b48ff5/src/win/fs.c#L1093

@IanButterworth
Copy link
Member Author

IanButterworth commented Feb 21, 2024

This seems to be working, the rm failures are gone on windows, but it has introduced inconsistent behavior on windows where these now do not error, so both fail

julia/test/file.jl

Lines 498 to 499 in 9ac2009

@test_throws Base.IOError rm(c_tmpdir)
@test_throws Base.IOError rm(c_tmpdir, force=true)

@vtjnash Should I push these changes (not the failed attempt at a test) to libuv/libuv#4318 or should we separate the changes?

@IanButterworth
Copy link
Member Author

IanButterworth commented Feb 22, 2024

This is failing again. On a windows VM with this build

julia> function posix_test()
           dir = mktempdir()
           testfile = joinpath(dir, "testfile")
           touch(testfile)
           isfile(testfile) || error("touch failed")
           readdir(dir) == ["testfile"] || error("readdir failed")
           io = open(testfile, "r")
           rm(testfile)
           rm(dir)
       end
posix_test (generic function with 1 method)

julia> posix_test()
ERROR: IOError: unlink("C:\\Users\\ian\\AppData\\Local\\Temp\\jl_NFxgoH\\testfile"): resource busy or locked (EBUSY)
Stacktrace:
 [1] uv_error
   @ .\libuv.jl:106 [inlined]
 [2] unlink(p::String)
   @ Base.Filesystem .\file.jl:1032
 [3] rm(path::String; force::Bool, recursive::Bool)
   @ Base.Filesystem .\file.jl:283
 [4] rm
   @ .\file.jl:273 [inlined]
 [5] posix_test()
   @ Main .\REPL[3]:9
 [6] top-level scope
   @ REPL[4]:1

on MacOS posix_test() has no issues

@IanButterworth
Copy link
Member Author

Note that I've moved the try-catch inside the for loop as recursive delete was failing eagerly and leaving deletable files behind.

An example of an ENOTEMPTY case

�_bk;t=1708636028447�      From worker 4:	‚îå Warning: Cannot delete due to UV_EACCES: C:\Users\julia\AppData\Local\Temp\jl_q2mtHt\compiled\v1.12\Bar.dll
�_bk;t=1708636029681�      From worker 4:	‚îÇ   stat(fp) =
�_bk;t=1708636029681�      From worker 4:	‚îÇ    StatStruct for "C:\\Users\\julia\\AppData\\Local\\Temp\\jl_q2mtHt\\compiled\\v1.12\\Bar.dll"
�_bk;t=1708636029681�      From worker 4:	‚îÇ       size: 11264 bytes
�_bk;t=1708636029681�      From worker 4:	‚îÇ     device: 344850202
�_bk;t=1708636029681�      From worker 4:	‚îÇ      inode: 342790
�_bk;t=1708636029681�      From worker 4:	‚îÇ       mode: 0o100666 (-rw-rw-rw-)
�_bk;t=1708636029681�      From worker 4:	‚îÇ      nlink: 1
�_bk;t=1708636029681�      From worker 4:	‚îÇ        uid: 0
�_bk;t=1708636029681�      From worker 4:	‚îÇ        gid: 0
�_bk;t=1708636029681�      From worker 4:	‚îÇ       rdev: 0
�_bk;t=1708636029681�      From worker 4:	‚îÇ      blksz: 4096
�_bk;t=1708636029681�      From worker 4:	‚îÇ     blocks: 24
�_bk;t=1708636029681�      From worker 4:	‚îÇ      mtime:  (2 seconds ago)
�_bk;t=1708636029681�      From worker 4:	‚îÇ      ctime:  (2 seconds ago)
�_bk;t=1708636029681�      From worker 4:	‚îî @ Base.Filesystem file.jl:300
�_bk;t=1708636029681�      From worker 4:	‚îå Warning: Cannot delete due to UV_EACCES: C:\Users\julia\AppData\Local\Temp\jl_q2mtHt\compiled\v1.12\Baz.dll
�_bk;t=1708636029720�      From worker 4:	‚îÇ   stat(fp) =
�_bk;t=1708636029720�      From worker 4:	‚îÇ    StatStruct for "C:\\Users\\julia\\AppData\\Local\\Temp\\jl_q2mtHt\\compiled\\v1.12\\Baz.dll"
�_bk;t=1708636029720�      From worker 4:	‚îÇ       size: 12288 bytes
�_bk;t=1708636029720�      From worker 4:	‚îÇ     device: 344850202
�_bk;t=1708636029720�      From worker 4:	‚îÇ      inode: 342888
�_bk;t=1708636029720�      From worker 4:	‚îÇ       mode: 0o100666 (-rw-rw-rw-)
�_bk;t=1708636029720�      From worker 4:	‚îÇ      nlink: 1
�_bk;t=1708636029720�      From worker 4:	‚îÇ        uid: 0
�_bk;t=1708636029720�      From worker 4:	‚îÇ        gid: 0
�_bk;t=1708636029720�      From worker 4:	‚îÇ       rdev: 0
�_bk;t=1708636029720�      From worker 4:	‚îÇ      blksz: 4096
�_bk;t=1708636029720�      From worker 4:	‚îÇ     blocks: 24
�_bk;t=1708636029720�      From worker 4:	‚îÇ      mtime:  (1 second ago)
�_bk;t=1708636029720�      From worker 4:	‚îÇ      ctime:  (1 second ago)
�_bk;t=1708636029720�      From worker 4:	‚îî @ Base.Filesystem file.jl:300
�_bk;t=1708636029720�      From worker 4:	‚îå Warning: Cannot delete due to UV_EACCES: C:\Users\julia\AppData\Local\Temp\jl_q2mtHt\compiled\v1.12\Foo.dll
�_bk;t=1708636029720�      From worker 4:	‚îÇ   stat(fp) =
�_bk;t=1708636029720�      From worker 4:	‚îÇ    StatStruct for "C:\\Users\\julia\\AppData\\Local\\Temp\\jl_q2mtHt\\compiled\\v1.12\\Foo.dll"
�_bk;t=1708636029720�      From worker 4:	‚îÇ       size: 12288 bytes
�_bk;t=1708636029720�      From worker 4:	‚îÇ     device: 344850202
�_bk;t=1708636029720�      From worker 4:	‚îÇ      inode: 342784
�_bk;t=1708636029720�      From worker 4:	‚îÇ       mode: 0o100666 (-rw-rw-rw-)
�_bk;t=1708636029720�      From worker 4:	‚îÇ      nlink: 1
�_bk;t=1708636029720�      From worker 4:	‚îÇ        uid: 0
�_bk;t=1708636029720�      From worker 4:	‚îÇ        gid: 0
�_bk;t=1708636029720�      From worker 4:	‚îÇ       rdev: 0
�_bk;t=1708636029720�      From worker 4:	‚îÇ      blksz: 4096
�_bk;t=1708636029720�      From worker 4:	‚îÇ     blocks: 24
�_bk;t=1708636029720�      From worker 4:	‚îÇ      mtime:  (3 seconds ago)
�_bk;t=1708636029720�      From worker 4:	‚îÇ      ctime:  (3 seconds ago)
�_bk;t=1708636029720�      From worker 4:	‚îî @ Base.Filesystem file.jl:300
�_bk;t=1708636029720�      From worker 4:	‚îå Warning: contextual tests failed to tidy up
�_bk;t=1708636029768�      From worker 4:	‚îÇ   err = IOError: rm("C:\\Users\\julia\\AppData\\Local\\Temp\\jl_q2mtHt\\compiled\\v1.12"): directory not empty (ENOTEMPTY)

The UV_EACCES is being thrown before reaching this debug print https://github.com/JuliaLang/libuv/blob/2c35bf30acf455e25aecdc7b9e44585bcf243f0b/src/win/fs.c#L1127 so before the posix delete attempt.

Why would -rw-rw-rw- files throw UV_EACESS?

@IanButterworth
Copy link
Member Author

IanButterworth commented Feb 23, 2024

I wonder if the issue is just specific to loaded dlls. In that when julia dlopens the dll it's not opened with FILE_SHARE_DELETE because windows is mmapping the dll, so when unlink tries to load the file handle with DELETE it is disallowed

FILE_SHARE_DELETE 0x00000004 |

Enables subsequent open operations on a file or device to request delete access. Otherwise, other processes cannot open the file or device if they request delete access. If this flag is not specified, but the file or device has been opened for delete access, the function fails.

and this interesting discussion https://stackoverflow.com/a/63678753


Also I double checked that the posix delete mode still requires file handles to be opened with DELETE and it appears to be the case IanButterworth/libuv#2

@IanButterworth
Copy link
Member Author

IanButterworth commented Feb 23, 2024

So I'm approaching a conclusion that:

  1. Adding posix delete for files and dirs seems a good thing so implement posix delete on windows, if supported libuv/libuv#4318 should probably go ahead, and close rm(..., recursive=true) "directory not empty" errors #34700

  2. Deleting (pkgimage) .dlls while loaded seems to still not be possible even with posix delete, because the handle to the file cannot be opened with DELETE while it's dlopened, so it seems to me that precompile_test_harness and things like it maybe should run in a subprocess so the dll is freed before the temp dir is rm-ed during clean up?
    Another option is to move the dll while loaded, which apparently is ok and tidy it up at a later time, allowing the temp dir to be rm-ed.. but this seems to not be better.

@IanButterworth
Copy link
Member Author

IanButterworth commented Feb 24, 2024

Just to add notes. It does seem UV_EACCES is being thrown here https://github.com/JuliaLang/libuv/blob/7d5a80d35512b12973d7fed4f36dc9044a27e14a/src/win/fs.c#L1131

[ Info: rm C:\Users\ian\.julia\compiled\v1.12\DataAPI\3a8mN_s6dyB.dll
[ Info: unlink C:\Users\ian\.julia\compiled\v1.12\DataAPI\3a8mN_s6dyB.dll
Posix delete on C:\Users\ian\.julia\compiled\v1.12\DataAPI\3a8mN_s6dyB.dll
┌ Warning: Cannot delete due to UV_EACCES: C:\Users\ian\.julia\compiled\v1.12\DataAPI\3a8mN_s6dyB.dll
│   stat(fp) =
│    StatStruct for "C:\\Users\\ian\\.julia\\compiled\\v1.12\\DataAPI\\3a8mN_s6dyB.dll"
│       size: 140288 bytes
│     device: 113935866
│      inode: 97876
│       mode: 0o100666 (-rw-rw-rw-)
│      nlink: 1
│        uid: 0
│        gid: 0
│       rdev: 0
│      blksz: 4096
│     blocks: 280
│      mtime:  (1 minute ago)
│      ctime:  (1 minute ago)
└ @ Base.Filesystem file.jl:294

I got that in a VM.

I was getting confused because on the buildkite logs the Posix delete on line is missing in this sequence, though it appears elsewhere.. (Could that be a logging vs. std capture sync issue?)

@IanButterworth
Copy link
Member Author

IanButterworth commented Feb 24, 2024

So that confirms that posix delete is failing with STATUS_CANNOT_DELETE, which given libuv here is setting FILE_DISPOSITION_IGNORE_READONLY_ATTRIBUTE, STATUS_CANNOT_DELETE can only mean that there is an existing mapped view to the file, preventing delete. See remarks here

A return value of STATUS_CANNOT_DELETE indicates that either the file is read-only, or there is an existing mapped view to the file.

I think that's a conclusion then. We can't rm DLLs that are currently loaded on windows, even with posix delete.

But I do believe you can move them. So there could be an approach of moving them out of the directory being deleted.. (if we did that then we should move the try catch in the rm recursion to inside the loop, as it currently fails fast if it knows it can't rm the dir)

It does seem like doing posix delete in libuv is still a good thing, and allows removing windows things like 5f236f3 (#53394)

fyi, @vchuravy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rm(..., recursive=true) "directory not empty" errors
3 participants