Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sh quoting of program names on an opt-in basis #1799

Closed
4 tasks done
EliahKagan opened this issue Jan 23, 2025 · 4 comments · Fixed by #1813
Closed
4 tasks done

sh quoting of program names on an opt-in basis #1799

EliahKagan opened this issue Jan 23, 2025 · 4 comments · Fixed by #1813
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@EliahKagan
Copy link
Member

EliahKagan commented Jan 23, 2025

Summary 💡

Possible tasks:

  • Document that gix_command::Prepare with use_shell does not quote (unless it will) (#1800)
  • Document which data in a Prepare instance are subject to shell expansions if unquoted (#1800)
  • Clarify the semantics of args (#1813)
  • Maybe: When use_shell is true, let caller opt into POSIX shell style quoting (#1801)

It might make sense to split this into multiple issues, in that the first three items could be done by themselves, though they become more important and possibly also different if fourth is done.

When using a shell, the how varies based on the why

By default, gix_command::Prepare rightly does not use a shell. But it supports using a shell if requested. This shell is generally sh and assumed to have POSIX shell semantics. There are at least three reasons one might want to use a shell:

  1. To run a command in the same way git would run it, where git is guaranteed to use a shell.
  2. To run a command that is expressed as a single string, with arguments split how sh splits them.
  3. To get a different environment that a shell is hoped to confer.

Reasons (1) and (2) are the most important

In the absence of (1), a shell must be used very carefully, and basically must never be used for running a command git documents not to use a shell or that can otherwise be assumed not to be run with a shell when git runs it. For example, if a command that should not run in a shell has a \ in its name due that being a path separator on Windows, and it is run in a shell anyway, then that will turn into an escape character.

When (2) is the only reason to use a shell, and neither (1) nor (3) apply, or they apply but can be "optimized away," then shell-like argument splitting can be done without using a shell. Prepare currently does this on Windows, if it is it is allowed but not required to use a shell, i.e. if use_shell and allow_manual_arg_splitting were both set to true.

The feature suggested here is about making (3) easier and slightly more likely to be done correctly when done at all. But I mention (1) and (2) because I think they are the main uses of use_shell, and I recommend only pursuing this feature idea for (3) if it will not interfere with, complicate, or confuse (1) and (2).

Reason (3) is hard to get right

I think it is very hard to do (3) effectively and achieve the predicted effect, as discussed in #1752 and #1758. The sh.exe in Git for Windows probably does provide an environment that is different from the caller's in some ways that could help find or use some tools. But:

  • Path interpretation could sometimes differ unexpectedly, since sh.exe/bash.exe support Cygwin-style paths. As one but not the only case of this, some Windows paths that std::process::Command supports (it supports nearly all Windows paths, including \\?\, \\.\, and even \??\ paths) can be rejected or misinterpreted.
  • It probably also differs in other ways that make things work less reliably. For example, if it and gitoxide are built for different architectures and one of them is 32-bit, then they may resolve the same paths to different locations when running some programs, due to the file system redirector, which I suspect may have been a factor in #1432 (which I do plan to revisit; I have not forgotten about #1585 (comment)).
  • Running something with sh.exe, or even with bash.exe, is not equivalent to using the interactive environment in a Git Bash environment. This is (mostly) because some environment customization is done through startup scripts that mainly run when a shell is used interactively.

Outside of Windows, I would expect nearly all of the difference between an environment using and not using a shell as an intermediary would be due to startup scripts that are unlikely to run, and probably should not be run, when using the shell in an automated, non-interactive way. This is because, unlike a Unix-style shell on Windows, a Unix-style on a Unix-like system is not doing anything extra to take an environment that is presumed to be non-Unix-like and make it Unix-like.

But we may be able to make (3) easier to get right on the occasion it is needed

If I understand correctly, the way gix_command::Prepare currently works when it uses an external shell process is that it assumes the shell is sh-like, invokes it with a -c argument followed by the value of the command field and, if args is nonempty and $@ does not already appear in the command field, adds a space followed by "$@" to the command and passes the arguments as separate argument to the shell, so that they will be treated as positional parameters and accessible through $1, $2, $3, etc., and with $@ and $*, with their usual semantics.

This assumes the command to be run should be subject to shell expansions, including expansions that might turn it into a command followed by some of its arguments, and then explicit args should not be supplied literally as positional parameters that the command can access. This seems generally reasonable when the goal of using a shell is (1) or (2), provided that the caller has considered all the effects of using a shell and understands exactly what is being passed to it. The first two proposed tasks ("Document ...") would help authors of code that use gix_command, or that use other code that uses gix_command and documents that it does so, so know what to expect.

But this may not be the right semantics for (3), where no expansions are wanted at all. Suppose (3) applies and (1) and (2) do not. Suppose further that using a shell has been confirmed in the specific use case to produce the desired environment (so the solution is not simply to abandon (3) as a goal or technique). Then the value of command should be quoted.

This seems feasible

command can only be quoted safely if enough is known about the shell language and if its quoting rules are simple enough.

Fortunately, POSIX-compatible sh, and most if not all modern Bourne-style shells, satisfy this requirement. For any string, so long as the string actually can be included in a script or -c operand (i.e., barring an encoding mismatch), the string can be quoted by replacing any preexisting occurrence of ' with the literal sequence '\'', prepending a ', and appending a '.

This works for causing it to be passed as an argument, so long as it is properly separated from the command name and other arguments with whitespace. It also works for using it as the command name. For example, C:\Users\O'Shaughnessy\foo bar.exe can be quoted as 'C:\Users\O'\''Shaughnessy\foo bar.exe'.

I don't think this should be done automatically. One reason it should not is that the caller may be quoting it already. But it might be useful as a feature that can be opted into.

Because--and so long as--args are not passed to the shell as part of text the shell considers a shell script, they should not be quoted and no option to quote them should be presented. But the semantics when doing this should be made clear, especially if this feature I am proposing is to be implemented. Furthermore, if this feature is implemented, then the circumstances under which quoting is added should be clearly distinguished from those under which it is not.

The process starts with the arguments, right?

It may be that the current summary of args will be understood correctly:

/// The arguments to pass to the spawned process.
pub args: Vec<OsString>,

However, I was momentarily tripped up by it, reading it to mean that the process has been spawned and then receives the arguments. This may be possible to fix while revising the docstring, or other revisions might make it clear automatically by describing exactly how the args are passed, or maybe this specific point about the wording being ambiguous was just an initial misreading on my part.

Motivation 🔦

The commented discussions in #1752 and #1758 reveal (3) as a goal. Part of the view I expressed there is that whether, and how, this should be done, is probably higher level than gitoxide.

But even though I think that is largely true, and even though I think gitoxide itself should be especially reluctant to automatically use a shell for any operations when neither (1) nor (2) decisively applies, there does seem to be a fragment of how to do (3) that belongs in gix_command: helping the caller cause the correct string to be used as command depending on whether quoting is needed.

Another motivation is that I noticed what looked like a possible use case for this in gitbutlerapp/gitbutler#6028. Note, however, that the only non-documentation change proposed in this feature request is opt-in quoting for command when command is interpreted as a single path. The quotes would be added and then, if args are present, a space and "$@" appended.

What this doesn't propose

This does not currently propose even opt-in quoting for arguments, because "$@" is already in place and seems sufficient to neutralize anything in them that would be treated specially by a shell if it appeared in text the shell considers a script or command.

This does not propose anything to add a -- argument before "$@", because whether it is needed or not is independent of whether a shell is used. More broadly, if present, it's just a literal "--", so it can be included explicitly in args at least as clearly and accurately as if specified in any other way.

@EliahKagan EliahKagan added the enhancement New feature or request label Jan 23, 2025
@Byron Byron added the help wanted Extra attention is needed label Jan 23, 2025
@Byron
Copy link
Member

Byron commented Jan 23, 2025

Thanks so much for bringing this up and the detailed analysis!

Some notes for context, while I remember it clearly:

  • gix-command is meant as a helper for Git-style program (or script) invocations with just the right flags to launch programs like Git does.
    • This means one would check how Git invokes a particular program class, and then emulate this with gix_command::Prepare.
  • Ideally, gix-command captures all Git behaviour correctly, particularly special handling that might be implemented on Windows.

In the latter case, I am not entirely sure this is the case as I am just looking at the Git codebase, not the Git for Windows codebase.

From what I could tell, Git doesn't offer additional quoting, but I am not sure and may be missing something.

However, when looking at gitbutlerapp/gitbutler#6028 applications may be interested in such a feature, even though I am not sure if the quoting as proposed here would have helped in case of GitButler. My feeling is that this isn't proposed here and can't be done automatically.

I don't think this should be done automatically. One reason it should not is that the caller may be quoting it already. But it might be useful as a feature that can be opted into.

I fully agree.

Since you looked into this and are the expert on this, I'd think you are in a good position to improve the documentation and add clarification way better than I could.

As for quoting, I'd be happy to take that on once it's clear how it should work and ideally, how it would improve downstream use-cases.

@EliahKagan
Copy link
Member Author

EliahKagan commented Jan 23, 2025

Since you looked into this and are the expert on this

I would definitely not say that I am an expert on it: gix-command::Prepare, and the ways it is used throughout other crates, is not a part of the code that I have dove into as deeply as I hope to.

However, I do agree with the rest of what you are suggesting. The documentation can be improved separately from introducing a new feature, it may be benefited by looking at how Git for Windows behaves here, and that behavior is definitely something I should examine before implementing a new feature to add quoting.

I'd think you are in a good position to improve the documentation and add clarification

I agree. I may not get to this immediately, but I plan to do it. If some time passes and I haven't, please feel free to remind me. (I opened this issue now because it seemed like something relevant to or even missing in what I had said in #1752 and #1758, and I didn't want to forget about it.)

From what I could tell, Git doesn't offer additional quoting, but I am not sure and may be missing something.

As for quoting, I'd be happy to take that on once it's clear how it should work and ideally, how it would improve downstream use-cases.

I think git does not run commands with shells for the environment conferred by the shell, and therefore has no need to run something in a shell but with non-shell semantics for the command being run--so it probably does not add quoting. That may be a good reason not to implement a quoting feature (even if opt-in) here either.

Maybe I should split this into a feature request about that feature and a separate issue about the documentation updates. If you think that would be best, please let me know.

@Byron
Copy link
Member

Byron commented Jan 23, 2025

Thank you 🙏.

Maybe I should split this into a feature request about that feature and a separate issue about the documentation updates. If you think that would be best, please let me know.

I would prefer to keep this wholistic issue with the task-list, possibly it could be highlighted more that the last task is a feature which is more on the optional side.

@EliahKagan
Copy link
Member Author

I would prefer to keep this wholistic issue with the task-list

Sounds good. That was my preference when I wrote the issue like this, so it certainly works for me.

possibly it could be highlighted more that the last task is a feature which is more on the optional side.

I've edited it along those lines, mainly by putting Maybe at the front.

Byron added a commit that referenced this issue Jan 23, 2025
That way it's possible to execute shell commands with spaces in their paths, for example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants