Conversation
511fcac to
7f508e0
Compare
|
@gaborbernat Any chance you could look at it again today? I think that I addressed the requests. This patch is a blocker for improving the security of GHA pipelines as I would not want to disable log collection for tox. Thanks. |
ca01304 to
37a1156
Compare
Please review again, I made the required changes, dropped the notice, added documentation entry, tested all keywords.
|
@gaborbernat Sorry to bother you again. Is there anything else I need to change? |
|
@gaborbernat The py314 failure is unrelated and seems to be a bug in coveragepy. I will try to help @nedbat with a patch if I do not run out of time. That also made me realize that I need to fix few bugs in tox-uv in order to run coveragepy tests..., side effect raising tox-dev/tox-uv#210 |
|
@gaborbernat Any idea about what to do about py314 errors related to coverage? They started to fail on main last night, so unrelated to this patch. Should we attempt to capture/hide these warnings to avoid our test failures? |
|
I think I will skip that warning on platforms where I don't ship a built wheel, though I'd be curious whether the installation of coverage on 3.14 shows that it tried to compile the extension or not? |
|
Coverage 7.9.1 is released to quiet that warning on 3.14. |
Sorted all comments, CI is reporting green now. Please review again so we can close it. Thanks
|
TY @gaborbernat |
A post-release audit surfaced four places where tox was trusting attacker-authored config or argv more than it needed to. None are individually exploitable in a realistic attack today, but each one narrows a class of regression a future contributor could easily introduce, and the fixes are small enough that keeping them out of a hardening PR would be harder to justify than rolling them in. Command argv was logged verbatim to the terminal, to the per-run log file, and in Outcome reprs, while environment variable values had been redacted since tox-dev#3543. That gap meant running a test with a literal --token=hunter2 flag leaked the value to disk in a place most users don't even know exists. Moving the secret-detection helpers into tox.util.redact and adding a shell_cmd_redacted counterpart keeps the shell_cmd property stable for existing callers (test helpers assert on the exact form) while giving the logging code path a safe string to emit. Space-separated flag values are intentionally left alone to avoid false positives on innocuous selectors like pytest -k test_foo, where there is no generic way to tell a flag value apart from the next positional. The PEP 723 runner from tox-dev#3912 resolved script = ... as tox_root / script with no containment check and no read cap, so a malicious config could point tox at ../../.ssh/id_rsa or a symlink to /dev/zero. Factoring the lookup into _resolve_script_path lets both the explicit existence check in _setup_env and the read-text call in _get_script_metadata go through the same Path.relative_to guard, and a 5 MiB ceiling bounds the read without affecting any plausible real script. The CLI parser built a Literal[...] string from action.choices and eval'd it. Every current callsite uses hardcoded strings so this has never been exploitable, but it is exactly the kind of pattern a plugin author could extend with user-controlled choices and regret later. typing.Literal accepts a tuple directly at runtime, so the eval can be replaced by Literal[tuple(action.choices)] without changing the produced type object. Finally, the requirements file parser was vendored from pip in tox-dev#2009 and lost the session-level timeout pip would otherwise apply from its --timeout default. A deliberately slow mirror can hang a remote -r http://... include indefinitely. A 30 second timeout matches pip's normal working range and is well above any legitimate requirements-file fetch latency.
A post-release audit surfaced four places where tox was trusting attacker-authored config or argv more than it needed to. None are individually exploitable in a realistic attack today, but each one narrows a class of regression a future contributor could easily introduce, and the fixes are small enough that keeping them out of a hardening PR would be harder to justify than rolling them in. Command argv was logged verbatim to the terminal, to the per-run log file, and in Outcome reprs, while environment variable values had been redacted since tox-dev#3543. That gap meant running a test with a literal --token=hunter2 flag leaked the value to disk in a place most users don't even know exists. Moving the secret-detection helpers into tox.util.redact and adding a shell_cmd_redacted counterpart keeps the shell_cmd property stable for existing callers (test helpers assert on the exact form) while giving the logging code path a safe string to emit. Space-separated flag values are intentionally left alone to avoid false positives on innocuous selectors like pytest -k test_foo, where there is no generic way to tell a flag value apart from the next positional. The PEP 723 runner from tox-dev#3912 resolved script = ... as tox_root / script with no containment check and no read cap, so a malicious config could point tox at ../../.ssh/id_rsa or a symlink to /dev/zero. Factoring the lookup into _resolve_script_path lets both the explicit existence check in _setup_env and the read-text call in _get_script_metadata go through the same Path.relative_to guard, and a 5 MiB ceiling bounds the read without affecting any plausible real script. The CLI parser built a Literal[...] string from action.choices and eval'd it. Every current callsite uses hardcoded strings so this has never been exploitable, but it is exactly the kind of pattern a plugin author could extend with user-controlled choices and regret later. typing.Literal accepts a tuple directly at runtime, so the eval can be replaced by Literal[tuple(action.choices)] without changing the produced type object. Finally, the requirements file parser was vendored from pip in tox-dev#2009 and lost the session-level timeout pip would otherwise apply from its --timeout default. A deliberately slow mirror can hang a remote -r http://... include indefinitely. A 30 second timeout matches pip's normal working range and is well above any legitimate requirements-file fetch latency.
A post-release audit surfaced four places where tox was trusting attacker-authored config or argv more than it needed to. None are individually exploitable in a realistic attack today, but each one narrows a class of regression a future contributor could easily introduce, and the fixes are small enough that keeping them out of a hardening PR would be harder to justify than rolling them in. Command argv was logged verbatim to the terminal, to the per-run log file, and in Outcome reprs, while environment variable values had been redacted since tox-dev#3543. That gap meant running a test with a literal --token=hunter2 flag leaked the value to disk in a place most users don't even know exists. Moving the secret-detection helpers into tox.util.redact and adding a shell_cmd_redacted counterpart keeps the shell_cmd property stable for existing callers (test helpers assert on the exact form) while giving the logging code path a safe string to emit. Space-separated flag values are intentionally left alone to avoid false positives on innocuous selectors like pytest -k test_foo, where there is no generic way to tell a flag value apart from the next positional. The PEP 723 runner from tox-dev#3912 resolved script = ... as tox_root / script with no containment check and no read cap, so a malicious config could point tox at ../../.ssh/id_rsa or a symlink to /dev/zero. Factoring the lookup into _resolve_script_path lets both the explicit existence check in _setup_env and the read-text call in _get_script_metadata go through the same Path.relative_to guard, and a 5 MiB ceiling bounds the read without affecting any plausible real script. The CLI parser built a Literal[...] string from action.choices and eval'd it. Every current callsite uses hardcoded strings so this has never been exploitable, but it is exactly the kind of pattern a plugin author could extend with user-controlled choices and regret later. typing.Literal accepts a tuple directly at runtime, so the eval can be replaced by Literal[tuple(action.choices)] without changing the produced type object. Finally, the requirements file parser was vendored from pip in tox-dev#2009 and lost the session-level timeout pip would otherwise apply from its --timeout default. A deliberately slow mirror can hang a remote -r http://... include indefinitely. A 30 second timeout matches pip's normal working range and is well above any legitimate requirements-file fetch latency.
A post-release audit surfaced four places where tox was trusting attacker-authored config or argv more than it needed to. None are individually exploitable in a realistic attack today, but each one narrows a class of regression a future contributor could easily introduce, and the fixes are small enough that keeping them out of a hardening PR would be harder to justify than rolling them in. Command argv was logged verbatim to the terminal, to the per-run log file, and in Outcome reprs, while environment variable values had been redacted since tox-dev#3543. That gap meant running a test with a literal --token=hunter2 flag leaked the value to disk in a place most users don't even know exists. Moving the secret-detection helpers into tox.util.redact and adding a shell_cmd_redacted counterpart keeps the shell_cmd property stable for existing callers (test helpers assert on the exact form) while giving the logging code path a safe string to emit. Space-separated flag values are intentionally left alone to avoid false positives on innocuous selectors like pytest -k test_foo, where there is no generic way to tell a flag value apart from the next positional. The PEP 723 runner from tox-dev#3912 resolved script = ... as tox_root / script with no containment check and no read cap, so a malicious config could point tox at ../../.ssh/id_rsa or a symlink to /dev/zero. Factoring the lookup into _resolve_script_path lets both the explicit existence check in _setup_env and the read-text call in _get_script_metadata go through the same Path.relative_to guard, and a 5 MiB ceiling bounds the read without affecting any plausible real script. The CLI parser built a Literal[...] string from action.choices and eval'd it. Every current callsite uses hardcoded strings so this has never been exploitable, but it is exactly the kind of pattern a plugin author could extend with user-controlled choices and regret later. typing.Literal accepts a tuple directly at runtime, so the eval can be replaced by Literal[tuple(action.choices)] without changing the produced type object. Finally, the requirements file parser was vendored from pip in tox-dev#2009 and lost the session-level timeout pip would otherwise apply from its --timeout default. A deliberately slow mirror can hang a remote -r http://... include indefinitely. A 30 second timeout matches pip's normal working range and is well above any legitimate requirements-file fetch latency.
A post-release audit surfaced four places where ``tox`` was trusting attacker-authored config or argv more than it needed to. None are individually exploitable in a realistic attack today, but each one narrows a class of regression a future contributor could easily reintroduce, and the fixes are small enough that keeping them out of a hardening PR would be harder to justify than rolling them in. 🔒 Command argv was logged verbatim to the terminal, to the per-run log file under ``.tox/<env>/log/``, and in ``Outcome.__repr__``, while environment-variable values have been redacted via ``redact_value`` since #3543. Running a test with a literal ``--token=hunter2`` flag therefore leaked the value to disk in a location most users never look at. The secret-detection helpers move into a new ``tox.util.redact`` module and gain a ``redact_argv`` helper; ``ExecuteRequest`` picks up a ``shell_cmd_redacted`` property so the logging paths can emit a masked form while the existing ``shell_cmd`` property stays stable for the test helpers that assert on its exact output. Space-separated flag values are intentionally left alone because there is no generic way to tell a flag value apart from the next positional without parser-specific knowledge, and masking ``pytest -k test_token`` would be a nasty false positive. The PEP 723 runner from #3912 resolved ``script = ...`` as ``tox_root / script`` with no containment check and no read cap, so a malicious config pointing at ``../../.ssh/id_rsa`` or a symlink to ``/dev/zero`` would either reach outside the project tree or balloon memory on ``read_text``. ✨ Factoring the lookup into ``_resolve_script_path`` lets both the existence check in ``_setup_env`` and the read in ``_get_script_metadata`` go through the same ``Path.relative_to`` guard, and a 5 MiB ceiling bounds the read without affecting any plausible real script. The CLI parser used to build a ``Literal[...]`` string from ``action.choices`` and ``eval`` it. Every current callsite passes hardcoded strings, but it is exactly the kind of pattern a plugin author could extend with user-controlled choices and regret later. ``typing.Literal`` accepts a tuple directly at runtime, so ``Literal[tuple(action.choices)]`` produces the same type object without the interpreter round-trip. Finally, the requirements file parser was vendored from pip in #2009 and lost the session-level timeout pip would otherwise apply from its ``--timeout`` default; a deliberately slow mirror could hang a remote ``-r http://...`` include forever, so ``urlopen`` now gets a 30-second timeout. Existing public imports from ``tox.tox_env.api`` (``redact_value``, ``SECRET_KEYWORDS``, ``SECRET_ENV_VAR_REGEX``) stay available as re-exports, so downstream plugins pinning against them keep working. The changelog entry lives at ``docs/changelog/3924.bugfix.rst``.

Improves logging of environment variables by sorting them by key and redacting the values for the ones that are likely to contain secrets.
Fixes: #3542
tox -e fix)docs/changelogfolder