Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When adding +x, get and set mode through the file descriptor #1803

Merged
merged 8 commits into from
Jan 25, 2025

Conversation

EliahKagan
Copy link
Member

@EliahKagan EliahKagan commented Jan 24, 2025

Fixes #1786

This changes finalize_entry so it uses the open file descriptor, rather than the path, to read the old mode and write the new mode. (This is the PR mentioned in #1786 (comment).)

How it worked before

Previously:

  • The old mode was read using lstat, called indirectly through the higher-level function std::fs::symlink_metadata.

    let old_perm = std::fs::symlink_metadata(path)?.permissions();

  • The new mode was set using chmod, called indirectly through the higher-level function std::fs::set_permissions.

    std::fs::set_permissions(path, new_perm)?;

    It is documented to use chmod, which may not be ideal, since if we are to continue to operate on a path here, lchmod is probably what we want.

How it would work now

With the changes in this PR, gix-worktree-state takes a direct dependency on rustix. I think this may be okay since gix-worktree-state already depends on gix-index, which depends on rustix. (I used the same version range and features as were used there.)

It uses the fstat and fchmod functions that rustix lightly wraps--as well as the RawMode and Mode types to avoid having to handle conversions and casts on systems like macOS where mode_t is 16-bit--to implement the new behavior of operating on the open file descriptor instead of the path:

Caveat 1 (but maybe benefit?): We must not dereference symlinks

Because fchmod operates on a file descriptor, it changes the permissions of whatever file is open. If we opened this file through a symlink, then this may be the wrong thing to do. I believe that this will not happen, at least on a Unix-like system. We have, typically, opened this file to write to it. So we try hard not to open a symlink, and I think avoid this in a way that completely robust, at least on systems where we are setting executable permissions. But this is an area I recommend be considered and examined when reviewing this PR.

See #1786 (comment) for more information on this and related issues, including an area where I think this PR may actually improve resilience to wrongly following symlinks.

Caveat 2 (minor): Was the path intended to be used for more?

There is one other area I suggest giving special attention when reviewing this PR, though I very much doubt it is a problem: I was surprised that the rela_path_as_path variable introduced near the top of set_executable_after_creation was actually only used for arranging for executable permissions to be added. Given the way the code was written, I wonder if it was intended to use it for more than that.


There is some more information in the commit messages. In particular, 588e6b0 presents the curious rela_path_as_path situation.

When executable permissions were set on a file being checked out,
and when it was done after the file was already created, this was
done on its path, using `chmod` (via a higher level abstraction).
But we have an open `File` object for it already, at that point.

This uses `fchmod` on the existing open file descriptor (obtained
from the `File` object), instead of `chmod` on the path.

(Since GitoxideLabs#1764, the file mode has also been read first, to figure out
what new mode to set it to for +x. That uses `lstat` (via a higher
level abstraction). The change here is therefore really only half
of what should be done. To complete it, and also to avoid new
problems due to a new inconsistency, that should be done with
`fstat` on the file descriptor instead of `lstat` on the path. That
will be done in a directly fortcoming commit -- after portability
issues in the type of the `st_mode` field, which is not actually
`u32` on all systems, are examined.)
The previous conversion worked on Linux but failed on macOS where
modes, as represented in system data structures, are 16-bit rather
than 32-bit.

The previous commit broke builds on macOS (and possibly other
OSes not currently tested on CI), which this should fix.

This commit also does some other refactorings:

- Simplify the way conversions are represented.

- Express with `expect` that, by this point, there are no unknown
  bits, rather than doing something that would silently preserve or
  silently discard them. There would have to be a bug (and probably
  in `gix-worktree-state` itself) for that not to be the case.

- Clarify the TODO comment, and also weaken it since there might be
  some reason for `set_mode_executable` to keep using `Permissions`
  in some way.
While replacing `into()` with `rustix::fs::Mode::from_bits(...)`
helped in expressing (and enforcing) the desired semantics for bits
that fit in `mode_t` but whose meaning is unrecognized, it did not
actually make the code build on macOS, and probably was not enough
to improve portability to any other systems where `mode_t` is not
`u32`.

This commit adds an explicit checked conversion of the value we get
from calling `mode()` on a `Permissions` object, which is aways
`u32`, into whatever type `mode_t` is defined as, which is `u32` on
most systems but `u16` on macOS and possibly others.
- Make the conditional `unused_variable` suppression specific to
  the `set_executable_after_creation` parameter, which is all that
  needs it. This also makes it more immediately clear that nothing
  along the lines of `chmod +x` is done (or relevant) on Windows.

- Allow `useless_conversion` for the conversion from `u32` to
  `rustix::fs::RawMode`. The `RawMode` type is what `rustix::fs`
  calls `mode_t` as used for the `st_mode` field of a `stat`
  structure. On most operating systems, it is `u32`, but this is
  not guaranteed by POSIX (which does not even guarantee that it is
  unsigned). It is `u16` at least on macOS and possibly other
  systems, and I am unsure if there are systems this code can run
  on where it is some other type besides `u16` or `u32`.

  For now, this does not attempt to make that suppression
  conditional, even though it is only needed when `RawMode` is
  already `u32`. I don't if there is a good way to express the
  condition under which it should apply. If there is, then it could
  be made conditional or, if it is truly reliable, then the
  conversion itself could be made conditional (though I suspect it
  is omitted when not needed in release builds, by optimization).
This uses `fstat` rather than a method that delegates to `lstat`,
to get the current mode bits of the still-open file using the open
file descriptor (obtained from the `File` object) rather than
opening the path, which could be a different file in the case of
concurrent modification by code outside gitoxide.

This is half of the change to using the file descriptor rather than
the path for the operations that adjust the mode by adding
executable bitsi. (As before, they are added for whoever already
had read bits, and we remove setuid, setgid, and sticky bits, so
that +x does not cause a security problem or other potentially
unexpected effects.)

The other half of this change was already done, appearing in a
recent `change:` commit. It consisted of setting the mode using
`fchmod` instead of a method that delegated to `chmod`. With only
that change, the overall effect would be worse, since possible
race conditions could introduce even greater inconsistency. With
both changes, the effect is consistent. Race conditions can still
occur, but their effect is probably less severe than before either
of the changes were made.

One of the ways this may be less subject to race conditions is
that, because neither obtaining the old mode nor setting the new
one uses a path, but only the open file descriptor, an unexpected
typechange will not occur: having opened a regular file, the open
file will not turn into a directory, symlink, or anything else,
*as seen by* the open file descriptor. The file may be moved or
replaced, but the open file descriptor "follows" the move and does
not alias the separate file that now has the original path.

Because we only intend to add executable bits to regular files, it
is no longer necessary to handle this as a race condition and keep
going. Attempting to add +x to any other type of thing should now
only be possible if there is a bug in the caller. So the check for
that, and the associated Option<...> logic used to keep things
testable, is not needed.

This still does the check, but panics if the file descriptor does
not represent a regular file. The tests of this are mostly removed,
but two tests that it does panic for the kinds of non-regular files
we are most likely to encounter are added. (It may eventually make
sense check this and panic only in debug builds, or not at all.)

There are now two `fstat` calls: the one that is added to check the
`st_mode` to figure out what to pass to `fchmod`, and the
preexisting higher level invocation (that delegates to `fstat`)
done just before closing the file, to update `entry.stat`. Using
the same call for both and accounting for any effect of `fchmod`
might be more complicated, but more importantly, it does not seem
like it would be correct. If attempting to set the permissions
reports success but has an unexpected effect, perhaps due to
restrictions of the underlying filesystem or how it is mounted
(which we may not be guarnateed to have detected in a probe to
decide if +x is supported), we would want the real stat.

Because this no longer uses `Permissions` and `PermissionsExt`
facilities to check the mode before adding executable bits or to
add the executable bits, it also includes some refactoring that is
possible now that they are not being used.
This extracts the logic that calls `fstat`, transforms the mode
with `let_readers_execute`, and writes it back through the same
file descriptor with `fchmod`, introducing a new helper function,
`set_executable`, for it.

The reason to do this is that it makes the issue of what kind of
`Error` appears in the returned `Result` clearer. This may be
slightly clearer to humans. It's significnatly clearer to the Rust
compiler, allowing both `map_err(std::io::Error::from)` calls to
be eliminated without having to write anything confusing or
inelegant.

(This is separate from the reason the `let_readers_execute`
function, which does not access the filesystem, is its own
function. That is so it's easy to test the mode transformation
logic.)
In `finalize_entry` and its callers, `set_executable_after_creation`
was previously an `Option<&Path>`, because `chmod` was called (via
a higher level abstraction) on the path to make the file executable.
From GitoxideLabs#1764, `lstat` (via a higher level abstraction) was likewise
called on the path to find out what its old mode was, to figure out
what mode to set to achieve +x.

Now that the `lstat` is replaced with `fstat`, and the `chmod` is
replaced with `fchmod`, figuring out how to set +x and setting it
are done entirely on the file descriptor, with the path unused. It
turns out that this was the only use of that path in that context.

Accordingly, this makes `set_executable_after_creation` a `bool`
in both `finalize_entry` and its callers, and removes code that was
only used for handling that path.

This includes eliminating the `rela_path_as_path` variable in
`checkout::chunk::process_delayed_filter_results`, which was
apparently only used for this purpose. (All writes and reads on it
were for preparing and passing a path to `finalize_entry`. The
`finalize_entry` function had only used it to set the file
executable, and had documented that this was its purpose.)
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the fix, this is great work!

There is one other area I suggest giving special attention when reviewing this PR, though I very much doubt it is a problem: I was surprised that the rela_path_as_path variable introduced near the top of set_executable_after_creation was actually only used for arranging for executable permissions to be added. Given the way the code was written, I wonder if it was intended to use it for more than that.

I think this was just me struggling to do what I wanted as I was missing the rustix functions to work on an open file handle.

///
/// This returns only mode bits, not file type. The return value can be passed to chmod or fchmod.
#[cfg(unix)]
fn let_readers_execute(mut raw_mode: rustix::fs::RawMode) -> rustix::fs::Mode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a great idea to factor this into its own, documented function!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This let_readers_execute function is actually not really new in this PR, though its documentation comment is. This function was renamed (in de939de) from set_mode_executable, and also changed to operate on different types. I had introduced set_mode_executable in bf33d2b (#1764) to be able to test this logic without actually doing any operations on the filesystem.

In contrast, the related set_executable function (which calls let_readers_execute between its fstat and fchmod calls) is new in this PR. I introduced it in fc82e78, originally to make ? seamlessly work, rather than having to write .map_err(std::io::Error::from), but my feeling is that having it separate from finalize_entry may improve readability slightly even beyond that.

I wasn't sure what the best name would be for set_executable. I also considered set_file_executable, but it seemed to me that would still not be clearer with respect to the important distinction between a file descriptor and its path, and I went with set_executable.

@Byron Byron merged commit 810b5cf into GitoxideLabs:main Jan 25, 2025
21 checks passed
@EliahKagan EliahKagan deleted the run-ci/fchmod branch January 25, 2025 20:11
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Jan 26, 2025
This replaces explicit `fstat` and `fchmod` calls (via `rustix`)
with `File::metadata` and `File::set_permissions`, respectively.

In practice, on Unix-like systems:

- `File::metadata` either:

   * calls `fstat`, or

   * calls `statx` in a way that causes it to operate similarly to
     `fstat` (this is used on Linux, in versions with `statx`).

  This is not explicitly documented, though calling `stat` or
  `lstat`, or calling `statx` in a way that would cause it to
  behave like `stat` or `lstat` rather than `fstat`, would not
  preserve the documented behavior of `File::metadata`, which
  operates on a `File` and does not take a path.

- `File::set_permissions` calls `fchmod`.

  This is explicitly documented and `fchmod` is listed as an alias
  for documentation purposes. But it is noted as an implementation
  detail that may change without warning. However, calling `chmod`
  or `lchmod` would not preserve the documented behavior of
  `File::set_permissions`, which (like `File::metadata`) operates
  on a file and does not take a path.

While these details can, in principle, change without warning, per
https://doc.rust-lang.org/stable/std/io/index.html#platform-specific-behavior,
to preserve the documented semantics of operating on the file
referenced by the file descriptor, it seems like the behavior would
still have to be correct for our use. In our use here, what matters
is that we use the file descriptor and not the path.

The change in this commit intends to maintain the effect of GitoxideLabs#1803
without requiring `gix-worktree-state` to depend directly on
`rustix`, and to benefit from the standard library's slightly
higher level interface where modes are treated as `u32` even on
operating systems where they are different (e.g. `u16` on macOS).

Importantly, this continues to differ from the behavior
before GitoxideLabs#1803 of using functions that operated on paths.

The current general correspondence between Rust `std`, POSIX
functions, and `statx`, where the rightmost three columns describe
`statx` calls, is:

| std::fs::*       | POSIX | fd       | path | flags               |
|------------------|-------|----------|------|---------------------|
| metadata         | stat  | AT_FDCWD | path |                     |
| symlink_metadata | lstat | AT_FDCWD | path | AT_SYMLINK_NOFOLLOW |
| File::metadata   | fstat | file     | ""   | AT_EMPTY_PATH       |

It may be that some uses of `rustix::fs` can be similarly replaced
in `gix_fs`, but this commit does not include any such changes.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Jan 26, 2025
This replaces explicit `fstat` and `fchmod` calls (via `rustix`)
with `File::metadata` and `File::set_permissions`, respectively.
The change here is confined to `gix-worktree-state` and, more
specifically, to the operation of checking the mode of an open file
and setting a new mode based on it with some executable bits added.

In practice, currently, on Unix-like systems:

- `File::metadata` either:

   * calls `fstat`, or

   * calls `statx` in a way that causes it to operate similarly to
     `fstat` (this is used on Linux, in versions with `statx`).

  This is not explicitly documented, though calling `stat` or
  `lstat`, or calling `statx` in a way that would cause it to
  behave like `stat` or `lstat` rather than `fstat`, would not
  preserve the documented behavior of `File::metadata`, which
  operates on a `File` and does not take a path.

- `File::set_permissions` calls `fchmod`.

  This is explicitly documented and `fchmod` is listed as an alias
  for documentation purposes. But it is noted as an implementation
  detail that may change without warning. However, calling `chmod`
  or `lchmod` would not preserve the documented behavior of
  `File::set_permissions`, which (like `File::metadata`) operates
  on a `File` and does not take a path.

While these details can, in principle, change without warning, per
https://doc.rust-lang.org/stable/std/io/index.html#platform-specific-behavior,
it seems that, in preserving the documented semantics of operating
on the file referenced by the file descriptor (rather than a path),
the behavior would still have to be correct for our use here.

The change in this commit intends to maintain the effect of GitoxideLabs#1803
with two minor improvements for maintainability and clarity:

- No longer require `gix-worktree-state` to depend directly on
  `rustix`. (This is minor because it still depends transitively on
  it through `gix-fs`, though some uses of `rustix::fs` in `gix-fs`
  might likewise be possible to replace in the future.)

- Gain the standard library's slightly higher level interface where
  modes are treated as `u32` even on operating systems where they
  are different (e.g. `u16` on macOS). This removes operations that
  do not correspond to what the code is conceptually doing. It also
  lets the function that computes the new mode from the old mode no
  longer depend on a type that differs across Unix-like targets.

Importantly, this continues to differ from the pre-GitoxideLabs#1803 behavior
of using functions that operated on paths.

For reading metadata on a file on Unix-like systems, the current
general correspondence between Rust `std`, POSIX functions, and
`statx`, where the rightmost three columns pertain to how `statx`
is called, is:

| std::fs::*       | POSIX | fd       | path | *distinctive* flags |
|------------------|-------|----------|------|---------------------|
| metadata         | stat  | AT_FDCWD | path | (none)              |
| symlink_metadata | lstat | AT_FDCWD | path | AT_SYMLINK_NOFOLLOW |
| File::metadata   | fstat | file     | ""   | AT_EMPTY_PATH       |

For writing metadata on a file on Unix-like systems, the current
correspondence between Rust `std` and POSIX functions is:

| std::fs::*            | POSIX  |
|-----------------------| -------|
| set_permissions       | chmod  |
| (none)                | lchmod |
| File::set_permissions | fchmod |

It may be that some uses of `rustix::fs` facilities can be
similarly replaced in `gix_fs`, but this commit does not include
any such changes.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Jan 26, 2025
This replaces explicit `fstat` and `fchmod` calls (via `rustix`)
with `File::metadata` and `File::set_permissions`, respectively.
The change here is confined to `gix-worktree-state` and, more
specifically, to the operation of checking the mode of an open file
and setting a new mode based on it with some executable bits added.

In practice, currently, on Unix-like systems:

- `File::metadata` either:

   * calls `fstat`, or

   * calls `statx` in a way that causes it to operate similarly to
     `fstat` (this is used on Linux, in versions with `statx`).

  This is not explicitly documented, though calling `stat` or
  `lstat`, or calling `statx` in a way that would cause it to
  behave like `stat` or `lstat` rather than `fstat`, would not
  preserve the documented behavior of `File::metadata`, which
  operates on a `File` and does not take a path.

- `File::set_permissions` calls `fchmod`.

  This is explicitly documented and `fchmod` is listed as an alias
  for documentation purposes. But it is noted as an implementation
  detail that may change without warning. However, calling `chmod`
  or `lchmod` would not preserve the documented behavior of
  `File::set_permissions`, which (like `File::metadata`) operates
  on a `File` and does not take a path.

While these details can, in principle, change without warning, per
https://doc.rust-lang.org/stable/std/io/index.html#platform-specific-behavior,
it seems that, in preserving the documented semantics of operating
on the file referenced by the file descriptor (rather than a path),
the behavior would still have to be correct for our use here.

The change in this commit intends to maintain the effect of GitoxideLabs#1803
with two minor improvements for maintainability and clarity:

- No longer require `gix-worktree-state` to depend directly on
  `rustix`. (This is minor because it still depends transitively on
  it through `gix-fs`, though some uses of `rustix::fs` in `gix-fs`
  might likewise be possible to replace in the future.)

- Use the standard library's slightly higher level interface where
  modes are treated as `u32` even on operating systems where they
  are different (e.g. `u16` on macOS). This removes operations that
  do not correspond to what the code is conceptually doing. It also
  lets the function that computes the new mode from the old mode no
  longer depend on a type that differs across Unix-like targets.

Importantly, this continues to differ from the pre-GitoxideLabs#1803 behavior
of using functions that operated on paths.

For reading metadata on a file on Unix-like systems, the current
general correspondence between Rust `std`, POSIX functions, and
`statx`, where the rightmost three columns pertain to how `statx`
is called, is:

| std::fs::*       | POSIX | fd       | path | *distinctive* flags |
|------------------|-------|----------|------|---------------------|
| metadata         | stat  | AT_FDCWD | path | (none)              |
| symlink_metadata | lstat | AT_FDCWD | path | AT_SYMLINK_NOFOLLOW |
| File::metadata   | fstat | file     | ""   | AT_EMPTY_PATH       |

For writing metadata on a file on Unix-like systems, the current
correspondence between Rust `std` and POSIX functions is:

| std::fs::*            | POSIX  |
|-----------------------| -------|
| set_permissions       | chmod  |
| (none)                | lchmod |
| File::set_permissions | fchmod |

It may be that some uses of `rustix::fs` facilities can be
similarly replaced in `gix_fs`, but this commit does not include
any such changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

finalize_entry uses path and FD that may diverge, could use just FD
2 participants