Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make gix_path::env:shell() more robust and use it in gix-command #1862

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

EliahKagan
Copy link
Member

This makes git_path::env::shell() more likely to work correctly on Windows, then attempts to use it in gix_command::Prepare when running a command with a shell due to use_shell being set to true. Full details are in the commit messages, though the changes are also summarized here.

I think this is not ready to merge yet, due to failures that happen when running the tests locally from PowerShell, but that I cannot produce otherwise. I do not know what causes these failures, and I expect I may have difficulty getting this to a point where it would be ready to merge without input. Since the failures happen in a test that was fairly recently added, I figured you might just know what is going on.

I think shell() can and should be used for gix-command. But if not then the reason will be relevant to other potential uses of shell(). In that case, assuming they leave enough good uses for shell() that it is kept, I hope to discover and articulate those reasons in the documentation for shell(). But I think most likely there is a bug or wrong test expectation somewhere that accounts for the local test failure, and that either the current implementation of shell() with the modifications included here, or something much like it, is suitable for broad use.

Making gix_path::env::shell() more robust

This makes gix_path::env::shell() more likely to work correctly on Windows by:

  • Checking that the path to a sh.exe associated with Git for Windows actually exists (and is either a regular file or a symbolic link that, when fully dereferenced, gives a file), because if that is not the case, then the fallback of just looking up sh.exe using a PATH search is more likely to succeed.

    This change is needed to use shell() more widely without breaking things on systems where sh.exe is available and usable but cannot be found in the usual location where Git for Windows supplies it.

  • Using / rather than \ directory separators when appending usr, bin, and sh.exe components to the Git for Windows installation directory path obtained via git --exec-path and going up three components. This causes the path through sh.exe is run to use all / separators except in the unusual situation that GIT_EXEC_PATH has been set to a path with \ separator. (This is unusual because people don't usually set GIT_EXEC_PATH. It would probably be best, when setting it on Windows, to use / separators, but I do not actually know how often people do that.)

    I believe this is only a minor improvement, in view of this path not being automatically passed through to the shell, as described in #1840 (reply in thread). But this change also makes tests in gix-command and gix-credentials pass that would otherwise require more extensive modification related to \ escaping, when modifying gix-command to use shell().

Using gix_path::env::shell() in gix-command

Before the changes in this PR, gix-command uses the relative path sh.exe on Windows. This works a significant fraction of the time already. In particular, the problems with the relative path bash.exe that make it frequently unusable to find a non-WSL bash on Windows, described in #1359 (comment), do not apply to sh.exe, since there is no Microsoft-provided sh.exe related to WSL. However:

  • Such an sh.exe is not always the best choice when present.
  • There may be no sh.exe that can be found in PATH. For example, if the only sh.exe is supplied by Git for Windows, and neither of the Git for Windows directories that contain an sh.exe binary are in PATH, then before the changes in this PR, a shell will not be found unless shell_program() is called (every time a command with use_shell is run).

Therefore, this uses shell(), with the improvements described above, in gix-command.


As alluded to above, I think this is not ready yet. All tests pass on CI, as well a when run locally from Git Bash. But when running them locally in PowerShell, gix-merge::merge blob::platform::merge::with_external fails. The failure does not require GIX_TEST_IGNORE_ARCHIVES to be set, and this PR does not currently modify any test fixture scripts.

Windows failures that occur when tests are run from PowerShell but not from Git Bash have usually been due to #1359 and are thus not expected since #1712, where gix-testtools finds bash.exe associated with Git for Windows and uses it to run fixtures. But that fix didn't affect the way gix-command uses a shell, so it does not provide for the kind of shell invocations occurring here.

Still, it doesn't make sense to me why the failure happens, and why it happens only when I run the tests locally from PowerShell. On this system, in the PATH, including when accessed from PowerShell, the command sh finds is a shim for the same shell that, as of this PR, gix-path finds and gix-command uses. Furthermore, the change here makes gix-path find sh.exe more like the way gix-testtools finds bash.exe, both of which are Bash when provided by Git for Windows.

Furthermore, while most of my local testing has been with Git for Windows 2.48.1, I have verified that the test fails the same way on the same system with Git for Windows 2.47.1(2), and that the experiment described below to examine the details of the failure also gives the same results with that version. This is to say that it seems like this failure is not related, even conceptually, to #1849.

C:\Users\ek\source\repos\gitoxide [run-ci/consistent-sh ≡]> cargo nextest run -p gix-merge --no-fail-fast
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.48s
────────────
 Nextest run ID 5771b786-7fff-43d7-9721-b20970cbb558 with nextest profile: default
    Starting 20 tests across 2 binaries
     Running [ 00:00:00] 0/20: 0 running, 0 passed, 0 skipped
        PASS [   0.018s] gix-merge::merge blob::builtin_driver::text::fuzzed
        PASS [   0.022s] gix-merge::merge blob::builtin_driver::text::both_sides_same_changes_are_conflict_free
        PASS [   0.025s] gix-merge::merge blob::pipeline::without_transformation
        PASS [   0.031s] gix-merge::merge blob::pipeline::non_existing
        PASS [   0.035s] gix-merge::merge blob::pipeline::binary_below_large_file_threshold
        PASS [   0.038s] gix-merge::merge blob::builtin_driver::text::both_differ_partially_resolution_is_conflicting
        PASS [   0.040s] gix-merge::merge blob::builtin_driver::binary
        PASS [   0.047s] gix-merge::merge blob::pipeline::worktree_filter
        PASS [   0.054s] gix-merge::merge blob::pipeline::above_large_file_threshold
        PASS [   0.052s] gix-merge::merge blob::platform::merge::builtin_with_conflict
        PASS [   0.054s] gix-merge::merge blob::platform::merge::builtin_text_uses_binary_if_needed
        PASS [   0.058s] gix-merge::merge blob::platform::merge::one_buffer_too_large
        PASS [   0.070s] gix-merge::merge blob::platform::merge::same_binaries_do_not_count_as_conflicted
        PASS [   0.067s] gix-merge::merge blob::platform::prepare_merge::ancestor_and_current_and_other_do_not_exist
        PASS [   0.067s] gix-merge::merge blob::platform::prepare_merge::driver_selection
        PASS [   0.059s] gix-merge::merge blob::platform::set_resource::invalid_resource_types
        PASS [   0.200s] gix-merge::merge blob::platform::merge::missing_buffers_are_empty_buffers
        FAIL [   0.245s] gix-merge::merge blob::platform::merge::with_external
──── STDOUT:             gix-merge::merge blob::platform::merge::with_external

running 1 test
test blob::platform::merge::with_external ... FAILED

failures:

failures:
    blob::platform::merge::with_external

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 19 filtered out; finished in 0.18s

──── STDERR:             gix-merge::merge blob::platform::merge::with_external
thread 'blob::platform::merge::with_external' panicked at gix-merge\tests\merge\blob\platform.rs:297:13:
b
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

        PASS [   0.330s] gix-merge::merge blob::builtin_driver::text::run_baseline
        PASS [   0.998s] gix-merge::merge tree::run_baseline
────────────
     Summary [   1.036s] 20 tests run: 19 passed, 1 failed, 0 skipped
        FAIL [   0.245s] gix-merge::merge blob::platform::merge::with_external
error: test run failed

The failing assertion is the first of the assertions in this fragment of the failing test:

let platform_ref = platform.prepare_merge(&db, Default::default())?;
let mut buf = Vec::new();
let res = platform_ref.merge(&mut buf, default_labels(), &Default::default())?;
assert_eq!(res, (Pick::Buffer, Resolution::Complete), "merge drivers always merge ");
let mut lines = cleaned_driver_lines(&buf)?;
for tmp_file in lines.by_ref().take(3) {
assert!(tmp_file.contains_str(&b".tmp"[..]), "{tmp_file}");
}
let lines: Vec<_> = lines.collect();
assert_eq!(
lines,
[
"7",
"b",
"ancestor label",
"current label",
"other label",
"%F",
"b",
"theirs"
],
"we handle word-splitting and definitely pick-up what's written into the %A buffer"
);

The first three of the cleaned driver lines are supposed to contain .tmp, but the first one is just b, which is expected to apper later. This somehow varies based on how the tests are run, and there are at least three environments for figuring out what's going on: CI, local in Git Bash, and local in PowerShell. It seems like they shouldn't differ, but they do.

Locally, it is possible to inspect things in a debugger, but this is harder on CI because while action-tmate can be used to get a reverse shell, on Windows runners the reverse shell's environment differs from the noninteractive environment in which the tests run. Fortunately, I can break the tests in a way that makes them always fail by panicking and printing all the lines:

diff --git a/gix-merge/tests/merge/blob/platform.rs b/gix-merge/tests/merge/blob/platform.rs
index 6230377b8..62882cc69 100644
--- a/gix-merge/tests/merge/blob/platform.rs
+++ b/gix-merge/tests/merge/blob/platform.rs
@@ -293,6 +293,11 @@ theirs
         let res = platform_ref.merge(&mut buf, default_labels(), &Default::default())?;
         assert_eq!(res, (Pick::Buffer, Resolution::Complete), "merge drivers always merge ");
         let mut lines = cleaned_driver_lines(&buf)?;
+        panic!(
+            "original: {:#?}\ncleaned: {:#?}",
+            buf.lines().map(|s| s.as_bstr()).collect::<Vec<_>>(),
+            lines.collect::<Vec<_>>()
+        );
         for tmp_file in lines.by_ref().take(3) {
             assert!(tmp_file.contains_str(&b".tmp"[..]), "{tmp_file}");
         }

To be clear, while all of the following are from messages printed in failing tests, the tests fail because of that change. All of them would otherwise pass, except the local run, in Windows, from PowerShell, shown last. Also, I do not know why this is happening. I am hoping you may have some insight into it.

Based off the main branch, on CI, in Ubuntu (would pass)

In the test-fast (ubuntu-latest) run, for comparison:

original: [
    "ours/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpmK4t1a",
    "/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpNFsS5l",
    "/home/runner/work/gitoxide/gitoxide/gix-merge/.tmp8rvAjw",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "ours/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpmK4t1a",
    "/.tmpNFsS5l",
    "/.tmp8rvAjw",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

Based off the main branch, on CI, in Windows (would pass)

In the test-fast (windows-latest) run:

original: [
    "oursD:agitoxidegitoxidegix-merge.tmpDYZgPn",
    "D:agitoxidegitoxidegix-merge.tmpOKiCuJ",
    "D:agitoxidegitoxidegix-merge.tmpOahQgz",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "oursD:agitoxidegitoxidegix-merge.tmpDYZgPn",
    "D:agitoxidegitoxidegix-merge.tmpOKiCuJ",
    "D:agitoxidegitoxidegix-merge.tmpOahQgz",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

Note how backslashes are being treated as escape characters and removed, which seems like it is not intended, but this does not in and of itself cause a problem with this test: it happens even on the main branch, and even without the changes from this PR.

Based off the main branch, locally, in Windows, from Git Bash (would pass)

Via cargo nextest run -p gix-merge --no-fail-fast:

original: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpORUoAj",
    "C:Userseksourcereposgitoxidegix-merge.tmpuUGgiZ",
    "C:Userseksourcereposgitoxidegix-merge.tmpsYzC12",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpORUoAj",
    "C:Userseksourcereposgitoxidegix-merge.tmpuUGgiZ",
    "C:Userseksourcereposgitoxidegix-merge.tmpsYzC12",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

Based off the main branch, locally, in Windows, from PowerShell (would pass)

Via cargo nextest run -p gix-merge --no-fail-fast:

original: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpPNy4Pa",
    "C:Userseksourcereposgitoxidegix-merge.tmp5Fkmzy",
    "C:Userseksourcereposgitoxidegix-merge.tmp9aFhwO",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpPNy4Pa",
    "C:Userseksourcereposgitoxidegix-merge.tmp5Fkmzy",
    "C:Userseksourcereposgitoxidegix-merge.tmp9aFhwO",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

Based off this feature branch, on CI, in Ubuntu (would pass)

In the test-fast (ubuntu-latest) run, for comparison:

original: [
    "ours/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpvKkoxy",
    "/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpr402lE",
    "/home/runner/work/gitoxide/gitoxide/gix-merge/.tmp8bJjIe",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "ours/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpvKkoxy",
    "/.tmpr402lE",
    "/.tmp8bJjIe",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

That is the same as based off the main branch, which is very much as expected, because the changes on this feature branch do not produce a different value for gix_path::env::shell() on systems other than Windows.

Based off this feature branch, on CI, on Windows (would pass)

In the test-fast (windows-latest) run:

original: [
    "oursD:agitoxidegitoxidegix-merge.tmpyuHJtT",
    "D:agitoxidegitoxidegix-merge.tmpC2s55H",
    "D:agitoxidegitoxidegix-merge.tmp06rraO",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "oursD:agitoxidegitoxidegix-merge.tmpyuHJtT",
    "D:agitoxidegitoxidegix-merge.tmpC2s55H",
    "D:agitoxidegitoxidegix-merge.tmp06rraO",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

This is similar to the main branch CI and local Git Bash runs shown above.

Based off this feature branch, locally, in Windows, from Git Bash (would pass)

Via cargo nextest run -p gix-merge --no-fail-fast:

original: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpUfxnE8",
    "C:Userseksourcereposgitoxidegix-merge.tmpiUdSfX",
    "C:Userseksourcereposgitoxidegix-merge.tmpmkVmRq",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpUfxnE8",
    "C:Userseksourcereposgitoxidegix-merge.tmpiUdSfX",
    "C:Userseksourcereposgitoxidegix-merge.tmpmkVmRq",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

Based off this feature branch, locally, in Windows, from PowerShell (would fail)

Via cargo nextest run -p gix-merge --no-fail-fast:

original: [
    "b",
    "theirserseksourcereposgitoxidegix-merge.tmpTDKDff",
    "C:Userseksourcereposgitoxidegix-merge.tmp5WJaSb",
    "C:Userseksourcereposgitoxidegix-merge.tmpyQDnZt",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
]
cleaned: [
    "b",
    "theirserseksourcereposgitoxidegix-merge.tmpTDKDff",
    "C:Userseksourcereposgitoxidegix-merge.tmp5WJaSb",
    "C:Userseksourcereposgitoxidegix-merge.tmpyQDnZt",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
]

Note how b comes first, and the three lines with paths to files named containing .tmp follow it. This causes this assertion, which is run for the first three lines, to fail in the first iteration (i.e. for the first line):

assert!(tmp_file.contains_str(&b".tmp"[..]), "{tmp_file}");

But I do not understand why. I'm not very familiar with the details of gix-merge. I'm also not entirely clear on which kinds of shell transformations (from parsing % items into fields, from shell expansions on unquoted parameter expansion, and from the implicit effect of joining on spaces when passing multiple arguments arising from an unquoted expansion to echo) are intended to occur in the test command, and with what effects.

Is this likely to be caused by different environments across shells run in different ways? Or is it somehow due to the name the shell is called with, even though that does not become its $0 here and, furthermore, its $0 does not seem to be expanded? What could be going on?

`gix_command::Prepare` previously used `sh` on Windows rather than
first checking for a usable `sh` implementation associated with a
Git for Windows installation. This changes it to use the `gix-path`
facility for finding what is likely the best 'sh' implementation
for POSIX shell scripts that will operate on Git repositories. This
increases consistency across how different crates find 'sh', and
brings the benefit of preferring the Git for Windows `sh.exe` on
Windows when it can be found reliably.
Because `gix-command` uses `gix_path::env::shell()` to find sh,
and `gix-credentials` uses `gix-command`.
This makes the path returned by `gix_path::env::shell()` on Windows
more usable by:

1. Adding components with `/` separators. While in principle a `\`
   should work, the path of the shell itself is used in shell
   scripts (script files and `sh -c` operands) that may not account
   for the presence of backslashes, and it is also harder to read
   paths with `\` in contexts where it appears escaped, which may
   include various messages from Rust code and shell scripts.

   The path before what we add will already use `/` and never `\`,
   unless `GIT_EXEC_PATH` has been set to a strange value, because
   it is based on `git --exec-path`, which by default gives a path
   with `/` separators. Thus, ensuring that the part we add uses `/`
   should be sufficient to get a path without `\` in all cases when
   it is clearly reasonable to do so. This therefore also usually
   increases stylistic consistency of the path, which is another
   factor that makes it more user-friendly in messages.

   This is needed to get tests to pass since changing `gix-command`
   to use `gix_path::env::shell()` on Windows, where a path is
   formatted in away that sometimes quotes `\` characters. Their
   expectations could be adjusted, but it seems likely that various
   other software, much of which may otherwise be working, has
   similar expectations. Using `/` instead of `\` works whether `\`
   is expected to be displayed quoted or not.

2. Check that the path to the shell plausibly has a shell there,
   only using it if it a file or a non-broken file symlink. When
   this is not the case, the fallback short name is used instead.

3. The fallback short name is changed from `sh` to `sh.exe`, since
   the `.exe` suffix is appended in other short names on Windows,
   such as `git.exe`, as well as being part of the filename
   component of the path we build for the shell when using the
   implementation provided as part of Git for Windows.

Those changes only affect Windows.

This also adds tests for (1) and (2) above, as well as for the
expectation that we get an absolute path, to make sure we don't
build a path that would be absolute on a Unix-like system but is
relative on Windows (a path that starts with just one `/` or `\`).

These tests are not Windows-specific, since all these expectations
should already hold on Unix-like systems, where currently we are
using the hard-coded path `/bin/sh`, which is an absolute path on
those systems. (Some Unix-like systems may technically not have
`/bin/sh` or it may not be the best path to use for a shell that
should be POSIX-compatible, but we are already relying on this,
and handling that better is outside the scope of the changes here.)
@Byron
Copy link
Member

Byron commented Feb 24, 2025

Thanks you very much for tackling this!

I didn't look at the code of this PR but want to share some thoughts on the failure.

Here is the program that is run:

for arg in  %O %A %B %L %P %S %X %Y %F; do echo $arg >> \"%A\"; done; cat \"%O\" \"%B\" >> \"%A\""

The failing output has 10 lines, whereas a valid output has 11 lines. Probably that is related. My feeling is that one of the arguments in the for loop substitutions is empty. It would disappear then, and everything is offset.
If these were quoted that shouldn't happen and is probably a good idea just to make the test-script more robust.

From there it should become more obvious which parameter comes out empty, and from there it should become clear what the problem really is.

I hope that helps.

@EliahKagan
Copy link
Member Author

The failing output has 10 lines, whereas a valid output has 11 lines. Probably that is related.

Yes, in the failing run, the theirs line is absent, and one of the b lines is moved to the top.

@EliahKagan
Copy link
Member Author

EliahKagan commented Feb 24, 2025

A couple things didn't mention in #1862 (comment):

My feeling is that one of the arguments in the for loop substitutions is empty. It would disappear then, and everything is offset.

It doesn't look like that specific mechanism accounts for the failure, because in the failing case a b line appears as the first line of the output, while otherwise b does not appear until the fifth line.

As an early step (before even doing most of the testing described above), I tried adding ' ' quotes around the format specifiers and " " quotes around the parameter expansion in the merge driver command. The effect was more splitting. For example, ancestor label became two lines, ancestor and label. I think part of what using unquoted parameter expansion is currently achieving for the echo arguments is split on newlines, when then effectively get replaced with spaces because when echo is run with multiple arguments it joins them with spaces. I am very unclear on what the exact intended behavior of this merge driver test command is.

The broader issue for me examining this specific test is that I am familiar neither with the details of gix-merge and its tests nor (even though this is general Git knowledge) with the format expected for custom merge drivers.

Thanks you very much for tackling this!

No problem. My goal that this is related to is to be able to fix some other bugs where we do not run scripts correctly, such as bugs in the special casing of scripts with shebangs when use_shell is false. Fixing such bugs runs the risk of producing worse problems if there are other bugs whose effects are worse than the effect of existing bugs. (For example, running a wrong command, instead of failing to run anything, would often be worse. Likewise, running a wrong command in circumstances that arise more readily could be worse than running a wrong command in circumstances that arise less readily.) It also runs the risk that I introduce new bugs that weren't present at all before.

But that also means I risk become separated from the original bug-fixing goals, which this is peripheral to, if I shift my focus too much onto this. If possible, I'd like to avoid become too embroiled in the details of the gix-merge tests. It might not be possible to avoid this, but there are two other approaches I may attempt first to try to figure out what is going on:

  • Write something to examine how the environment is changed before and after the changes here. Manual testing with similar but non-identical ways of running shell processes suggests that the environment changes are not very great on my local machine (which is where they would have to be significant in order to be the cause of the problems I am observing on that machine but not CI). But this has the advantage that (a) it might be readily adaptable into a regression test, and (b) if I can accurately state how the environments are different, then your much greater familiarity with the failing gix-merge test and the intended gix-merge behavior might make the problem obvious even if it is not obvious to me.
  • Investigate failures on a conceptually related feature branch where I am trying to improve how gix-testtools finds bash. I'm not sure yet because this could be the result of a mistake (and unfortunately, the full test suite takes quite some time to run with GIX_TEST_IGNORE_ARCHIVES=1 on my local Windows 10 development machine), but it looks like there are comparable failures--and many of them, though most may be from a single failing fixture script--when I try to make changes there that harmonize how gix-testtools runs bash with how this PR tries to make gix-path find sh. The changes I'm working on in gix-testtools seem much smaller than those here, yet I have far worse local test failures, so that's interesting. I'll try to give more information on that soon, possibly in another draft PR about that.

Edit: Reworded for clarity and reordered the sections.

@Byron
Copy link
Member

Byron commented Feb 25, 2025

theirs is empty, even though it should be a be something that the caller knows.

I am very unclear on what the exact intended behavior of this merge driver test command is.

It attempts to make the arguments that the merge-driver was called with observable. I think there is no quoting because each of the arguments is known not to have spaces in it. However, the lack of quoting also makes empty arguments unobservable, a definitive shortcoming of the current implementation.

The broader issue for me examining this specific test is that I am familiar neither with the details of gix-merge and its tests nor (even though this is general Git knowledge) with the format expected for custom merge drivers.

Here is the docs, but the short version is that it's a script into which various parameters can be substituted into - this substitution is performed by the the caller of the merge-driver. That caller does a bare string substitution, without any care for quotes or whitespace in general.

I'd like to avoid become too embroiled in the details of the gix-merge tests.

I definitely wouldn't want you to suffer unnecessarily. The key for me to solve this certainly is to reproduce the error. For now, it doesn't make much sense to me either, unless… it's a substitution engine, what if one of the arguments that it substitutes also contains substitutable characters? Probably that's not what's happening here though, as % should be uncommon enough.

This is a patch that should tell exactly what's executed, and from there it would become more obvious what's really happening.

diff --git a/gix-merge/src/blob/platform/merge.rs b/gix-merge/src/blob/platform/merge.rs
index 058a30128..324742a00 100644
--- a/gix-merge/src/blob/platform/merge.rs
+++ b/gix-merge/src/blob/platform/merge.rs
@@ -211,7 +211,7 @@ pub(super) mod inner {
                     cmd.extend_from_slice(token);
                 }
 
-                Ok(merge::Command {
+                Ok(dbg!(merge::Command {
                     cmd: gix_command::prepare(gix_path::from_bstring(cmd))
                         .with_context(context)
                         .command_may_be_shell_script()
@@ -223,7 +223,7 @@ pub(super) mod inner {
                     current_path: ours_path,
                     ancestor: base_tmp,
                     other: theirs_tmp,
-                })
+                }))
             }
 
             /// Return the configured driver program for use with [`Self::prepare_external_driver()`], or `Err`

I hope that helps.

@EliahKagan
Copy link
Member Author

EliahKagan commented Feb 25, 2025

Thanks--you're right that the actual arguments that are passed are important information. I had glossed over this because it did not look like they differed between the passing and failing environment, but that doesn't mean they aren't needed to know what's going on.

The results, with the patch you gave that reveals the Command instance with dbg!, when running the test on this feature branch from PowerShell where it fails, are:

C:\Users\ek\source\repos\gitoxide [run-ci/consistent-sh ≡ +0 ~1 -0 ~]> cargo nextest run -p gix-merge with_external
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.46s
────────────
 Nextest run ID 42b38560-812f-4bcd-9548-97bb529b272d with nextest profile: default
    Starting 1 test across 2 binaries (19 tests skipped)
     Running [ 00:00:00] 0/1: 0 running, 0 passed, 0 skipped
        FAIL [   0.158s] gix-merge::merge blob::platform::merge::with_external
──── STDOUT:             gix-merge::merge blob::platform::merge::with_external

running 1 test
test blob::platform::merge::with_external ... FAILED

failures:

failures:
    blob::platform::merge::with_external

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 19 filtered out; finished in 0.15s

──── STDERR:             gix-merge::merge blob::platform::merge::with_external
[gix-merge\src\blob\platform\merge.rs:214:20] merge::Command
{
    cmd:
    gix_command::prepare(gix_path::from_bstring(cmd)).with_context(context).command_may_be_shell_script().stdin(Stdio::null()).stdout(Stdio::inherit()).stderr(Stdio::inherit()).into(),
    current: ours_tmp, current_path: ours_path, ancestor: base_tmp, other:
    theirs_tmp,
} = "C:/Users/ek/scoop/apps/git/2.48.1/usr/bin/sh.exe" "-c" "for arg in  C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmp8ULmDC C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpmHJGGo C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpLByBCI 7 \'b\' \'ancestor label\' \'current label\' \'other label\' %F; do echo $arg >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpmHJGGo\"; done; cat \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmp8ULmDC\" \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpLByBCI\" >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpmHJGGo\"" "--"

thread 'blob::platform::merge::with_external' panicked at gix-merge\tests\merge\blob\platform.rs:297:13:
b
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

  Cancelling due to test failure
────────────
     Summary [   0.167s] 1 test run: 0 passed, 1 failed, 19 skipped
        FAIL [   0.158s] gix-merge::merge blob::platform::merge::with_external
error: test run failed

That's a bit hard to read due to the way the Command sub-object formats, and it does not automatically show output in the case that the test passes. So I have also tested with this change instead:

diff --git a/gix-merge/src/blob/platform/merge.rs b/gix-merge/src/blob/platform/merge.rs
index 058a30128..39e27e51f 100644
--- a/gix-merge/src/blob/platform/merge.rs
+++ b/gix-merge/src/blob/platform/merge.rs
@@ -399,6 +399,7 @@ impl<'parent> PlatformRef<'parent> {
         match self.configured_driver() {
             Ok(driver) => {
                 let mut cmd = self.prepare_external_driver(driver.command.clone(), labels, context.clone())?;
+                panic!("program: {:?}\nargs: {:#?}", cmd.get_program(), cmd.get_args());
                 let status = cmd.status().map_err(|err| Error::SpawnExternalDriver {
                     cmd: format!("{:?}", cmd.cmd),
                     source: err,

Then, when run from PowerShell, the output included:

program: "C:/Users/ek/scoop/apps/git/2.48.1/usr/bin/sh.exe"
args: CommandArgs {
    inner: [
        Regular(
            "-c",
        ),
        Regular(
            "for arg in  C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpayn7BE C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpEvLyJo C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpCbHa1a 7 \'b\' \'ancestor label\' \'current label\' \'other label\' %F; do echo $arg >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpEvLyJo\"; done; cat \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpayn7BE\" \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpCbHa1a\" >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpEvLyJo\"",
        ),
        Regular(
            "--",
        ),
    ],
}

When run from Git Bash, where the test passes when allowed to proceed, the output included this instead:

program: "C:/Users/ek/scoop/apps/git/2.48.1/usr/bin/sh.exe"
args: CommandArgs {
    inner: [
        Regular(
            "-c",
        ),
        Regular(
            "for arg in  C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpfax7ph C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpcPJLm3 C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmp5fMWd7 7 \'b\' \'ancestor label\' \'current label\' \'other label\' %F; do echo $arg >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpcPJLm3\"; done; cat \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpfax7ph\" \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmp5fMWd7\" >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpcPJLm3\"",
        ),
        Regular(
            "--",
        ),
    ],
}

This is to say that only the randomly generated suffixes after .tmp in temporary filenames are different (and they would differ across separate identical runs, too).

On the main branch, in PowerShell, where the test also passes if allowed to proceed, the only differences besides those temporary files' names is the shell command:

program: "sh"
args: CommandArgs {
    inner: [
        Regular(
            "-c",
        ),
        Regular(
            "for arg in  C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpD7kDyR C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmppnvU83 C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpkYgWkr 7 \'b\' \'ancestor label\' \'current label\' \'other label\' %F; do echo $arg >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmppnvU83\"; done; cat \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpD7kDyR\" \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpkYgWkr\" >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmppnvU83\"",
        ),
        Regular(
            "--",
        ),
    ],
}

State from previous runs in this experiment is not likely to be a contributing factor; I ran gix clean -xde in between.

On main, sh is used rather than a full. That is looked up to C:\Users\ek\scoop\shims\sh.exe on this system, which is a scoop shim that delegates to C:\Users\ek\scoop\apps\git\current\bin\sh.exe, which is the Git for Windows bin\sh.exe shim that delegates to the Git for Windows usr\bin\sh.exe. That latter distinction--that is, whether the Git for Windows sh.exe is involved--seems to be an essential element here, because the failure goes away under this change:

diff --git a/gix-path/src/env/mod.rs b/gix-path/src/env/mod.rs
index 78e2da294..cd6203df7 100644
--- a/gix-path/src/env/mod.rs
+++ b/gix-path/src/env/mod.rs
@@ -51,7 +51,7 @@ pub fn shell() -> &'static OsStr {
                     // more readable messages, append literally with `/` separators. The path from
                     // `git --exec-path` will already have all `/` separators (and no trailing `/`)
                     // unless it was explicitly overridden to an unusual value via `GIT_EXEC_PATH`.
-                    raw_path.push("/usr/bin/sh.exe");
+                    raw_path.push("/bin/sh.exe");
                     raw_path
                 })
                 .filter(|raw_path| {

But I am reluctant to apply that change without understanding what it is about the effect of the Git for Windows shim that is helping, to know that this shim really is preferable rather than the test happening to hinge on a quirk that is different. One reason I am reluctant is that, under that change, if git is provided by the Git for Windows SDK, its sh.exe would not be found. If the non-shim Git for Windows sh.exe is usable but just moderately less preferred, then we could check for the shim first and then fall back to the non-shim. But I would want to understand the situation better before jumping into that, in case really neither is preferred, or in case the non-shim executable does not work well enough for our needs to be worth using directly.

My guess is that the relevant effect of the shim is in changing the environment, but I am not certain it refrains entirely from adjusting arguments. (I think the shim does not adjust arguments directly. But even if that is right, msys-2.0.dll affects interpretation of arguments--that is, what ends up in the argv that a program that uses that DLL as its libc uses--and does so in a way that is affected by some environment variables.)

I'll try to modify the command to report its arguments and environment in a way that they cannot be confused with anything else, such as by writing them to a file. However, since this will necessarily modifying the command or its environment, it is not certain to be successful at revealing the cause.


Edit: A possible contributing factor is that I also have some bin directories associated with a regular MSYS2 installation (separate from Git for Windows) in my PATH on Windows.

When the shim is used, two directories belonging to the MSYS2 installation that is associated with the running shell, /mingw64/bin and /usr/bin, as well as my per-user bin at /c/Users/ek/bin (which I think is less important in this case), are prepended, to take precedence.

The reason I say /mingw64/bin and /usr/bin belong to the MSYS2 installation associated with the running shell is that they are Unix-style paths that are automatically treated that way and translated based on that when calling native external commands.

C:\Users\ek\scoop\apps\git\2.48.1\usr\bin\sh.exe -c 'echo "$PATH" | tr ":" "\n" | sort' > ubp.txt
C:\Users\ek\scoop\apps\git\2.48.1\bin\sh.exe -c 'echo "$PATH" | tr ":" "\n" | sort' > bp.txt
git --no-pager diff --no-index ubp.txt bp.txt

Where the output of that last command is:

diff --git a/ubp.txt b/bp.txt
index 7390dca90..7190bd5b3 100644
--- a/ubp.txt
+++ b/bp.txt
@@ -1,3 +1,6 @@
+/mingw64/bin
+/usr/bin
+/c/Users/ek/bin
 /c/Users/ek/scoop/apps/pwsh/7.5.0
 /c/Program Files (x86)/VMware/VMware Workstation/bin
 /c/Windows/system32

Note that this experiment is not equivalent to checking the $0, $@, and the environment variables in an actual merge driver command, and I am not certain if this is the cause. Even if it is, I am not clear on how it is the cause.

@Byron
Copy link
Member

Byron commented Feb 26, 2025

This is very strange and puzzling.
I understand that the way the shell is called is the same no matter what, so the reason for the difference is in how the shell scripted merge-driver is interpreted.

And that's particularly strange as there isn't even any argument to interpret, the values are quite literally baked in. All I can think of is poking around in the script, maybe by starting to execute it directly. It's pretty clear what is invoked in the working and non-working case, so that could be generalized to invocations that are repeatable on the command-line directly. From there it can be reduced and probed to see what exactly is causing the observable difference (assuming it reproduces in the first place). If it doesn't, we'd know that some environment variable is affecting it as well.

And I say this without even implying that you spend more time on this, I am just sharing thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants