various improvements #1800

Byron · 2025-01-23T10:00:31Z

To support gitbutlerapp/gitbutler#6060 .

Maybe related to documentation tasks in #1799 .

…with_shell()` to enforce using a shell. That way it's clear the shell will only be used if the command actually is a shell script. This also renames `with_shell_*` variants to `command_may_be_shell_script_*` variants, without a respective `with_shell_*` alternative.

EliahKagan

The documentation changes look great and, as you alluded to, possibly fix the first three tasks listed in #1799.

If I understand correctly, with_shell has a new meaning now, subtly but importantly different from its meaning before, and it can still be called in the same way. I have a worry about that, expressed below, but I haven't thought of a use case that I specifically expect to be broken.

EliahKagan · 2025-01-23T10:46:46Z

gix-command/src/lib.rs

+        /// already present in the command.
+        ///
+        /// If this method is not called, commands are always executed verbatim, without the use of a shell.
+        pub fn with_shell(mut self) -> Self {


Will this cause code targeting the previous API and calling with_shell without actually requiring a shell to be used to continue to build and appear to work, but to work differently by using a shell in circumstances where it wasn't before? If so, I wonder if that might introduce bugs, or worsen existing bugs, in code that uses gix-command.

I was aware of this (and agree that it could cause unwanted effects downstream) and probably didn't even mark it as fix: which is what I thought about when doing so.
with_shell() now actually does what it says, as opposed to before when it surprisingly only used a shell when the command seemed to need it. Especially on Windows this matters so I really thought of it as a fix for that.

Yes, it is true that it could fix bugs where people assumed use_shell had a more intuitive meaning. An alternative could be to abandon the method name use_shell altogether and call all the methods something different. That would break anything that should be broken. It might also break things that need not break.

On the other hand, I recognize it's possible to take that too far and accumulate lots of names that one does use in the future even when they would be the best description available. So I don't know what's best.

In this project itself, I think the changed methods may be useful for future bug investigation. I suspect we have some invocations that treat the command name as a shell script when, to match the expectations set by Git's behavior and the guarantees in Git's documentation, it would be necessary to treat the command as a shell script when it was obtained from some sources, while being necessary to treat it not as a shell script (i.e. just as a path) when it was obtained from other sources.

The specific case of this that I most strongly suspect the SSH client. When obtained from the environment GIT_SSH, it must not be run or parsed as a shell script (regardless of its value). When obtained from GIT_SSH_COMMAND, or the value of the core.sshCommand configuration variable, it is a shell script and must either be run as one or (in sufficiently simple cases) parsed as one. These requirements are explicated in git(1).

I have not checked recently--and maybe not at all, in terms of running experiments to verify it--that we are doing the wrong thing here. When I have a chance to test it, I'll open an issue if we are. I recall it having seemed so when I was working on #1342, and that it seemed like a contributing factor to the bug reported in #1432 and discussed further in connection with some related topics in #1585 (comment). This was the suspected aspect of #1432 highlighted in #1432 (comment).

If we don't yet have an issue for it, then I'll try to open one once I get around to checking the effect in a way that facilitates direct comparison to Git behavior. Some invocations of the SSH client occur only under some circumstances; for example, it is only invoked to figure out what kind of SSH client it is if that cannot be inferred from its name (and is not overridden by configuration), and I think that is one of the places where GIT_SSH may be being wrongly treated like GIT_SSH_COMMAND.

Thanks for bringing this up!

I was aware of the difference between GIT_SSH and GIT_SSH_COMMAND at the time and should have implemented it correctly. Due to me renaming with_shell() to its new name, in this codebase the semantics are preserved, too, so I'd think there is no issue, still.

However, there I support and value an investigation to assure there aren't any subtle bugs around that in the hopes that this also helps the API of gix-command to mature. After all, it's vital to make it as bullet-proof and unambiguous to use as possible.

Due to me renaming with_shell() to its new name, in this codebase the semantics are preserved, too, so I'd think there is no issue, still.

I very much agree that this PR introduces no new incorrect behavior--related to the distinction between GIT_SSH and GIT_SSH_COMMAND, or otherwise. But I think there is probably a preexisting issue, and if so, it remains intact. (One of the ways this PR is significant to that is that it calls attention to such possible things, which is a good thing about this PR, not a bad thing.)

On Windows, \ is the primary directory separator, so it often appears in paths that are otherwise unremarkable, like C:\Windows\System32\OpenSSH\ssh.exe. But \ is also taken to mean that we must run a command in a shell if we are allowed to do so (since \ is a shell metacharacter). Specifically:

The method now called command_may_be_shell_script is meant to conditionally refrain from planning to run unremarkable paths using a shell, but the presence of \ causes it to go ahead and plan that by setting the use_shell field to true:

gitoxide/gix-command/src/lib.rs

Lines 108 to 110 in f3dc83b

self.use_shell = self.command.to_str().map_or(true, |cmd| {

cmd.as_bytes().find_byteset(b"|&;<>()$`\\\"' \t\n*?[#~=%").is_some()

});

When the use_shell field is set to true, using a shell is planned, but it may still be averted if the allow_manual_arg_splitting field is set to true, which is its default value on Windows. However, \ is also one of the characters that suppresses the effect of allow_manual_arg_splitting:

gitoxide/gix-command/src/lib.rs

Lines 222 to 234 in f3dc83b

let split_args = prep

.allow_manual_arg_splitting

.then(|| {

if gix_path::into_bstr(std::borrow::Cow::Borrowed(prep.command.as_ref()))

.find_byteset(b"\\|&;<>()$`\n*?[#~%")

.is_none()

{

prep.command

.to_str()

.and_then(|args| shell_words::split(args).ok().map(Vec::into_iter))

} else {

None

}

Where, matching on split_args:

gitoxide/gix-command/src/lib.rs

Lines 243 to 244 in f3dc83b

None => {

let mut cmd = Command::new(

Therefore, if there are places where the function now called command_may_be_shell_script is called when it shouldn't be called, on Windows this will often have the effect of running a command in a shell that should not be run that way--this effect is more likely on Windows, due to \ being a common character in paths.

To be clear, this behavior is not in any way exacerbated by the changes in this PR. My thinking was that, as far as this sort of thing goes, the changes here are a boon, because they make it easier to reason about what is going on.

However, I just now realize that there is actually a sense in which the changes in this PR exacerbate this. Before, because the method had the name with_shell, it conveyed the impression that a shell would be used in more circumstances than it was intended to convey. But as detailed above, it seems that a shell was also used, on Windows, in more circumstances than it was intended to be used. These partially "cancelled out": the impression it gave was, on Windows, closer to the truth than it probably is now.

I don't suggest that as a reason to undo anything from this PR. Instead, the specific effect of \ in a command on those two methods on Windows, in light of the special significance of \ on Windows, can modified, documented, or both; and any code in this project that calls command_may_be_shell_script when it shouldn't can be identified and fixed.

I was aware of the difference between GIT_SSH and GIT_SSH_COMMAND at the time and should have implemented it correctly.

As far as I can tell, in the main code path used to decide how to invoke an SSH client to actually connect to a remote, this is implemented perfectly. In gix::repository::config::ssh_connect_options, the fallback_active flag keeps track of where the SSH client command is configured from:

gitoxide/gix/src/repository/config/mod.rs

Lines 63 to 70 in f3dc83b

let mut fallback_active = false;

let ssh_command = config

.string_filter(Core::SSH_COMMAND, &mut trusted)

.or_else(|| {

fallback_active = true;

config.string_filter(gitoxide::Ssh::COMMAND_WITHOUT_SHELL_FALLBACK, &mut trusted)

})

.map(|cmd| gix_path::from_bstr(cmd).into_owned().into());

This is used to decide, I believe correctly, whether to disallow the use of a shell:

gitoxide/gix/src/repository/config/mod.rs

Lines 71 to 79 in f3dc83b

let opts = gix_protocol::transport::client::ssh::connect::Options {

disallow_shell: fallback_active,

command: ssh_command,

kind: config

.string_filter("ssh.variant", &mut trusted)

.and_then(|variant| Ssh::VARIANT.try_into_variant(variant).transpose())

.transpose()

.with_leniency(self.options.lenient_config)?,

};

These options are ultimately passed to gix_transport::client::blocking_io::ssh::connect. In most cases, this connect function arranges a scenario in which the SSH client will run once. This happens through the last thing this connect function does, which to unconditionally call gix_transport::client::blocking_io::file::SpawnProcessOnDemand::new_ssh. That call relates to actually using the SSH client to make a connection to a server and transfer data. It properly passes along the disallow_shell flag:

gitoxide/gix-transport/src/client/blocking_io/ssh/mod.rs

Lines 136 to 144 in f3dc83b

Ok(blocking_io::file::SpawnProcessOnDemand::new_ssh(

url,

ssh_cmd,

path,

kind,

options.disallow_shell,

desired_version,

trace,

))

However, before this connect function does that, it figures out how the SSH client will be invoked to do that. It checks if this is configured, and otherwise tries to ascertain what SSH implementation it is or behaves like. Under some conditions, it runs the SSH client in order to do this. I am unclear at this time on how often or exactly under what circumstances that happens. The logic is:

gitoxide/gix-transport/src/client/blocking_io/ssh/mod.rs

Lines 112 to 113 in f3dc83b

let mut kind = options.kind.unwrap_or_else(|| ProgramKind::from(ssh_cmd));

if options.kind.is_none() && kind == ProgramKind::Simple {

But when it does it, it runs a command that it sets up like this:

gitoxide/gix-transport/src/client/blocking_io/ssh/mod.rs

Lines 115 to 125 in f3dc83b

gix_command::prepare(ssh_cmd)

.stderr(Stdio::null())

.stdout(Stdio::null())

.stdin(Stdio::null())

.command_may_be_shell_script()

.arg("-G")

.arg(match url.host_as_argument() {

Usable(host) => host,

Dangerous(host) => Err(Error::AmbiguousHostName { host: host.into() })?,

Absent => panic!("BUG: host should always be present in SSH URLs"),

}),

(In #1342, I modified part of this, but not the part that is relevant to the use of a shell. I wondered why it was calling what was, back then, named with_shell, but this was not part of what was necessary to modify to achieve the goals of that PR.)

That code unconditionally calls the command_may_be_shell_script method, and does not subsequently do anything to reverse its effects. It runs the command immediately. Specifically, the above expression is passed to:

gitoxide/gix-transport/src/client/blocking_io/ssh/mod.rs

Line 114 in f3dc83b

let mut cmd = std::process::Command::from(

And the subsequent code is:

gitoxide/gix-transport/src/client/blocking_io/ssh/mod.rs

Lines 127 to 132 in f3dc83b

gix_features::trace::debug!(cmd = ?cmd, "invoking `ssh` for feature check");

kind = if cmd.status().ok().is_some_and(|status| status.success()) {

ProgramKind::Ssh

} else {

ProgramKind::Simple

};

Consequently, when gix-transport pre-runs an SSH client to figure out what kind of SSH client it is, it seems to me that it runs it in a way that allows a shell to be used and, combined with the aforementioned effect of \ suppressing the conditional shell-avoidance behaviors of both command_may_be_shell_script and allow_manual_arg_splitting, this will often run it in a (Bourne-style) shell on Windows, where it will typically fail or otherwise malfunction because the shell treats \ as the escape character.

(Separately from that, I am unclear on why gix_transport::client::blocking_io::file::SpawnProcessOnDemand::new_local always passes a false value of ssh_disallow_shell. But since, if I understand correctly, that is not using SSH, my guess is that it is okay.)

Edit: I've opened #1814, which would fix the second part of what I've described here--that is, the place in gix-transport where disallow_shell should be followed but is not, when running the SSH client to figure out what implementation it is. I'm not sure how good shape that PR is in yet, though (as described there).

Thanks so much for sharing! I see now how this PR made it easier to realize that on Windows, the shell would actually run more often than it would on Unix.

(Separately from that, I am unclear on why gix_transport::client::blocking_io::file::SpawnProcessOnDemand::new_local always passes a false value of ssh_disallow_shell. But since, if I understand correctly, that is not using SSH, my guess is that it is okay.)

Memory does not serve, unfortunately, but given the importance of not messing this up (i.e. URLs could be crafted to use the shell to execute arbitrary commands) I'd hope that was intentional. However, I'd think it's better to document why this is false given that shells ideally are avoided entirely.

This revises the `gix_command::Prepare` documentation, mainly for clarity but also to add some information and cover or explain some cases that were not (or not as fully) covered before. This builds on recent documentation changes, such as those in GitoxideLabs#1800. Less importantly, this also: - Wraps `Prepare` documentation comments to a more consistent width, when doing so improved unrendered readability. - Made a trace message more precise, to avoid obscuring a subtlety about the distinction between what we are looking for and what we are adding, since that might occasionally relate to the reason someone is examining trace messages.

Byron enabled auto-merge January 23, 2025 10:00

Byron disabled auto-merge January 23, 2025 10:11

Byron enabled auto-merge January 23, 2025 10:11

Byron disabled auto-merge January 23, 2025 10:12

Byron added 2 commits January 23, 2025 11:13

adapt to changes in gix-command

28f712e

Byron force-pushed the improvements branch from 80ad737 to 28f712e Compare January 23, 2025 10:15

Byron enabled auto-merge January 23, 2025 10:20

Byron merged commit cc7b614 into main Jan 23, 2025
21 checks passed

Byron deleted the improvements branch January 23, 2025 10:39

EliahKagan reviewed Jan 23, 2025

View reviewed changes

EliahKagan mentioned this pull request Jan 26, 2025

sh quoting of program names on an opt-in basis #1799

Closed

4 tasks

This was referenced Jan 26, 2025

Revise and somewhat expand Prepare docs #1813

Merged

Honor disallow_shell in SSH client feature check #1814

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

various improvements #1800

various improvements #1800

Byron commented Jan 23, 2025 •

edited

Loading

EliahKagan left a comment

EliahKagan Jan 23, 2025

Byron Jan 25, 2025 •

edited

Loading

EliahKagan Jan 26, 2025

Byron Jan 26, 2025

EliahKagan Jan 26, 2025 •

edited

Loading

Byron Jan 27, 2025

	self.use_shell = self.command.to_str().map_or(true, \|cmd\| {
	cmd.as_bytes().find_byteset(b"\|&;<>()$`\\\"' \t\n*?[#~=%").is_some()
	});

	let split_args = prep
	.allow_manual_arg_splitting
	.then(\|\| {
	if gix_path::into_bstr(std::borrow::Cow::Borrowed(prep.command.as_ref()))
	.find_byteset(b"\\\|&;<>()$`\n*?[#~%")
	.is_none()
	{
	prep.command
	.to_str()
	.and_then(\|args\| shell_words::split(args).ok().map(Vec::into_iter))
	} else {
	None
	}

	let mut fallback_active = false;
	let ssh_command = config
	.string_filter(Core::SSH_COMMAND, &mut trusted)
	.or_else(\|\| {
	fallback_active = true;
	config.string_filter(gitoxide::Ssh::COMMAND_WITHOUT_SHELL_FALLBACK, &mut trusted)
	})
	.map(\|cmd\| gix_path::from_bstr(cmd).into_owned().into());

	let opts = gix_protocol::transport::client::ssh::connect::Options {
	disallow_shell: fallback_active,
	command: ssh_command,
	kind: config
	.string_filter("ssh.variant", &mut trusted)
	.and_then(\|variant\| Ssh::VARIANT.try_into_variant(variant).transpose())
	.transpose()
	.with_leniency(self.options.lenient_config)?,
	};

	Ok(blocking_io::file::SpawnProcessOnDemand::new_ssh(
	url,
	ssh_cmd,
	path,
	kind,
	options.disallow_shell,
	desired_version,
	trace,
	))

	let mut kind = options.kind.unwrap_or_else(\|\| ProgramKind::from(ssh_cmd));
	if options.kind.is_none() && kind == ProgramKind::Simple {

	gix_command::prepare(ssh_cmd)
	.stderr(Stdio::null())
	.stdout(Stdio::null())
	.stdin(Stdio::null())
	.command_may_be_shell_script()
	.arg("-G")
	.arg(match url.host_as_argument() {
	Usable(host) => host,
	Dangerous(host) => Err(Error::AmbiguousHostName { host: host.into() })?,
	Absent => panic!("BUG: host should always be present in SSH URLs"),
	}),

	gix_features::trace::debug!(cmd = ?cmd, "invoking `ssh` for feature check");
	kind = if cmd.status().ok().is_some_and(\|status\| status.success()) {
	ProgramKind::Ssh
	} else {
	ProgramKind::Simple
	};

various improvements #1800

various improvements #1800

Conversation

Byron commented Jan 23, 2025 • edited Loading

EliahKagan left a comment

Choose a reason for hiding this comment

EliahKagan Jan 23, 2025

Choose a reason for hiding this comment

Byron Jan 25, 2025 • edited Loading

Choose a reason for hiding this comment

EliahKagan Jan 26, 2025

Choose a reason for hiding this comment

Byron Jan 26, 2025

Choose a reason for hiding this comment

EliahKagan Jan 26, 2025 • edited Loading

Choose a reason for hiding this comment

Byron Jan 27, 2025

Choose a reason for hiding this comment

Byron commented Jan 23, 2025 •

edited

Loading

Byron Jan 25, 2025 •

edited

Loading

EliahKagan Jan 26, 2025 •

edited

Loading