Skip to content

Log environment variables sorted by key while redacting values of unsafe ones#3543

Merged
ssbarnea merged 1 commit intomainfrom
fix/3542
Jun 17, 2025
Merged

Log environment variables sorted by key while redacting values of unsafe ones#3543
ssbarnea merged 1 commit intomainfrom
fix/3542

Conversation

@ssbarnea
Copy link
Copy Markdown
Member

@ssbarnea ssbarnea commented Jun 3, 2025

Improves logging of environment variables by sorting them by key and redacting the values for the ones that are likely to contain secrets.

Fixes: #3542

  • ran the linter to address style issues (tox -e fix)
  • wrote descriptive pull request text
  • ensured there are test(s) validating the fix
  • added news fragment in docs/changelog folder
  • updated/extended the documentation

Comment thread tests/tox_env/test_api.py Outdated
@ssbarnea ssbarnea force-pushed the fix/3542 branch 3 times, most recently from 511fcac to 7f508e0 Compare June 4, 2025 12:01
@ssbarnea ssbarnea requested a review from gaborbernat June 4, 2025 12:47
@ssbarnea
Copy link
Copy Markdown
Member Author

ssbarnea commented Jun 4, 2025

@gaborbernat Any chance you could look at it again today? I think that I addressed the requests.

This patch is a blocker for improving the security of GHA pipelines as I would not want to disable log collection for tox. Thanks.

@ssbarnea ssbarnea changed the title Improve logging of environment variables Log environment variables sorted by key while redacting values of some unsafe ones Jun 10, 2025
@ssbarnea ssbarnea changed the title Log environment variables sorted by key while redacting values of some unsafe ones Log environment variables sorted by key while redacting values of unsafe ones Jun 10, 2025
@ssbarnea ssbarnea force-pushed the fix/3542 branch 2 times, most recently from ca01304 to 37a1156 Compare June 10, 2025 16:10
@ssbarnea ssbarnea dismissed gaborbernat’s stale review June 10, 2025 16:15

Please review again, I made the required changes, dropped the notice, added documentation entry, tested all keywords.

@ssbarnea
Copy link
Copy Markdown
Member Author

@gaborbernat Sorry to bother you again. Is there anything else I need to change?

Comment thread docs/user_guide.rst Outdated
Copy link
Copy Markdown
Member

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/tox/tox_env/api.py Outdated
@ssbarnea
Copy link
Copy Markdown
Member Author

@gaborbernat The py314 failure is unrelated and seems to be a bug in coveragepy. I will try to help @nedbat with a patch if I do not run out of time. That also made me realize that I need to fix few bugs in tox-uv in order to run coveragepy tests..., side effect raising tox-dev/tox-uv#210

@ssbarnea
Copy link
Copy Markdown
Member Author

@gaborbernat Any idea about what to do about py314 errors related to coverage?

CoverageWarning: Couldn't import C tracer: No module named 'coverage.tracer' (no-ctracer)

They started to fail on main last night, so unrelated to this patch. Should we attempt to capture/hide these warnings to avoid our test failures?

@nedbat
Copy link
Copy Markdown
Contributor

nedbat commented Jun 12, 2025

I think I will skip that warning on platforms where I don't ship a built wheel, though I'd be curious whether the installation of coverage on 3.14 shows that it tried to compile the extension or not?

@nedbat
Copy link
Copy Markdown
Contributor

nedbat commented Jun 13, 2025

Coverage 7.9.1 is released to quiet that warning on 3.14.

@ssbarnea ssbarnea dismissed gaborbernat’s stale review June 17, 2025 10:20

Sorted all comments, CI is reporting green now. Please review again so we can close it. Thanks

@ssbarnea ssbarnea merged commit 5077e20 into main Jun 17, 2025
28 checks passed
@ssbarnea ssbarnea deleted the fix/3542 branch June 17, 2025 15:14
@cidrblock
Copy link
Copy Markdown

TY @gaborbernat

gaborbernat added a commit to gaborbernat/tox that referenced this pull request Apr 14, 2026
A post-release audit surfaced four places where tox was trusting
attacker-authored config or argv more than it needed to. None are
individually exploitable in a realistic attack today, but each one
narrows a class of regression a future contributor could easily
introduce, and the fixes are small enough that keeping them out of
a hardening PR would be harder to justify than rolling them in.

Command argv was logged verbatim to the terminal, to the per-run
log file, and in Outcome reprs, while environment variable values
had been redacted since tox-dev#3543. That gap meant running a test with
a literal --token=hunter2 flag leaked the value to disk in a place
most users don't even know exists. Moving the secret-detection
helpers into tox.util.redact and adding a shell_cmd_redacted
counterpart keeps the shell_cmd property stable for existing
callers (test helpers assert on the exact form) while giving the
logging code path a safe string to emit. Space-separated flag
values are intentionally left alone to avoid false positives on
innocuous selectors like pytest -k test_foo, where there is no
generic way to tell a flag value apart from the next positional.

The PEP 723 runner from tox-dev#3912 resolved script = ... as
tox_root / script with no containment check and no read cap, so a
malicious config could point tox at ../../.ssh/id_rsa or a symlink
to /dev/zero. Factoring the lookup into _resolve_script_path lets
both the explicit existence check in _setup_env and the
read-text call in _get_script_metadata go through the same
Path.relative_to guard, and a 5 MiB ceiling bounds the read
without affecting any plausible real script.

The CLI parser built a Literal[...] string from action.choices and
eval'd it. Every current callsite uses hardcoded strings so this
has never been exploitable, but it is exactly the kind of pattern
a plugin author could extend with user-controlled choices and
regret later. typing.Literal accepts a tuple directly at runtime,
so the eval can be replaced by Literal[tuple(action.choices)]
without changing the produced type object.

Finally, the requirements file parser was vendored from pip in
tox-dev#2009 and lost the session-level timeout pip would otherwise apply
from its --timeout default. A deliberately slow mirror can hang a
remote -r http://... include indefinitely. A 30 second timeout
matches pip's normal working range and is well above any legitimate
requirements-file fetch latency.
gaborbernat added a commit to gaborbernat/tox that referenced this pull request Apr 15, 2026
A post-release audit surfaced four places where tox was trusting
attacker-authored config or argv more than it needed to. None are
individually exploitable in a realistic attack today, but each one
narrows a class of regression a future contributor could easily
introduce, and the fixes are small enough that keeping them out of
a hardening PR would be harder to justify than rolling them in.

Command argv was logged verbatim to the terminal, to the per-run
log file, and in Outcome reprs, while environment variable values
had been redacted since tox-dev#3543. That gap meant running a test with
a literal --token=hunter2 flag leaked the value to disk in a place
most users don't even know exists. Moving the secret-detection
helpers into tox.util.redact and adding a shell_cmd_redacted
counterpart keeps the shell_cmd property stable for existing
callers (test helpers assert on the exact form) while giving the
logging code path a safe string to emit. Space-separated flag
values are intentionally left alone to avoid false positives on
innocuous selectors like pytest -k test_foo, where there is no
generic way to tell a flag value apart from the next positional.

The PEP 723 runner from tox-dev#3912 resolved script = ... as
tox_root / script with no containment check and no read cap, so a
malicious config could point tox at ../../.ssh/id_rsa or a symlink
to /dev/zero. Factoring the lookup into _resolve_script_path lets
both the explicit existence check in _setup_env and the
read-text call in _get_script_metadata go through the same
Path.relative_to guard, and a 5 MiB ceiling bounds the read
without affecting any plausible real script.

The CLI parser built a Literal[...] string from action.choices and
eval'd it. Every current callsite uses hardcoded strings so this
has never been exploitable, but it is exactly the kind of pattern
a plugin author could extend with user-controlled choices and
regret later. typing.Literal accepts a tuple directly at runtime,
so the eval can be replaced by Literal[tuple(action.choices)]
without changing the produced type object.

Finally, the requirements file parser was vendored from pip in
tox-dev#2009 and lost the session-level timeout pip would otherwise apply
from its --timeout default. A deliberately slow mirror can hang a
remote -r http://... include indefinitely. A 30 second timeout
matches pip's normal working range and is well above any legitimate
requirements-file fetch latency.
gaborbernat added a commit to gaborbernat/tox that referenced this pull request Apr 15, 2026
A post-release audit surfaced four places where tox was trusting
attacker-authored config or argv more than it needed to. None are
individually exploitable in a realistic attack today, but each one
narrows a class of regression a future contributor could easily
introduce, and the fixes are small enough that keeping them out of
a hardening PR would be harder to justify than rolling them in.

Command argv was logged verbatim to the terminal, to the per-run
log file, and in Outcome reprs, while environment variable values
had been redacted since tox-dev#3543. That gap meant running a test with
a literal --token=hunter2 flag leaked the value to disk in a place
most users don't even know exists. Moving the secret-detection
helpers into tox.util.redact and adding a shell_cmd_redacted
counterpart keeps the shell_cmd property stable for existing
callers (test helpers assert on the exact form) while giving the
logging code path a safe string to emit. Space-separated flag
values are intentionally left alone to avoid false positives on
innocuous selectors like pytest -k test_foo, where there is no
generic way to tell a flag value apart from the next positional.

The PEP 723 runner from tox-dev#3912 resolved script = ... as
tox_root / script with no containment check and no read cap, so a
malicious config could point tox at ../../.ssh/id_rsa or a symlink
to /dev/zero. Factoring the lookup into _resolve_script_path lets
both the explicit existence check in _setup_env and the
read-text call in _get_script_metadata go through the same
Path.relative_to guard, and a 5 MiB ceiling bounds the read
without affecting any plausible real script.

The CLI parser built a Literal[...] string from action.choices and
eval'd it. Every current callsite uses hardcoded strings so this
has never been exploitable, but it is exactly the kind of pattern
a plugin author could extend with user-controlled choices and
regret later. typing.Literal accepts a tuple directly at runtime,
so the eval can be replaced by Literal[tuple(action.choices)]
without changing the produced type object.

Finally, the requirements file parser was vendored from pip in
tox-dev#2009 and lost the session-level timeout pip would otherwise apply
from its --timeout default. A deliberately slow mirror can hang a
remote -r http://... include indefinitely. A 30 second timeout
matches pip's normal working range and is well above any legitimate
requirements-file fetch latency.
gaborbernat added a commit to gaborbernat/tox that referenced this pull request Apr 15, 2026
A post-release audit surfaced four places where tox was trusting
attacker-authored config or argv more than it needed to. None are
individually exploitable in a realistic attack today, but each one
narrows a class of regression a future contributor could easily
introduce, and the fixes are small enough that keeping them out of
a hardening PR would be harder to justify than rolling them in.

Command argv was logged verbatim to the terminal, to the per-run
log file, and in Outcome reprs, while environment variable values
had been redacted since tox-dev#3543. That gap meant running a test with
a literal --token=hunter2 flag leaked the value to disk in a place
most users don't even know exists. Moving the secret-detection
helpers into tox.util.redact and adding a shell_cmd_redacted
counterpart keeps the shell_cmd property stable for existing
callers (test helpers assert on the exact form) while giving the
logging code path a safe string to emit. Space-separated flag
values are intentionally left alone to avoid false positives on
innocuous selectors like pytest -k test_foo, where there is no
generic way to tell a flag value apart from the next positional.

The PEP 723 runner from tox-dev#3912 resolved script = ... as
tox_root / script with no containment check and no read cap, so a
malicious config could point tox at ../../.ssh/id_rsa or a symlink
to /dev/zero. Factoring the lookup into _resolve_script_path lets
both the explicit existence check in _setup_env and the
read-text call in _get_script_metadata go through the same
Path.relative_to guard, and a 5 MiB ceiling bounds the read
without affecting any plausible real script.

The CLI parser built a Literal[...] string from action.choices and
eval'd it. Every current callsite uses hardcoded strings so this
has never been exploitable, but it is exactly the kind of pattern
a plugin author could extend with user-controlled choices and
regret later. typing.Literal accepts a tuple directly at runtime,
so the eval can be replaced by Literal[tuple(action.choices)]
without changing the produced type object.

Finally, the requirements file parser was vendored from pip in
tox-dev#2009 and lost the session-level timeout pip would otherwise apply
from its --timeout default. A deliberately slow mirror can hang a
remote -r http://... include indefinitely. A 30 second timeout
matches pip's normal working range and is well above any legitimate
requirements-file fetch latency.
gaborbernat added a commit that referenced this pull request Apr 15, 2026
A post-release audit surfaced four places where ``tox`` was trusting
attacker-authored config or argv more than it needed to. None are
individually exploitable in a realistic attack today, but each one
narrows a class of regression a future contributor could easily
reintroduce, and the fixes are small enough that keeping them out of a
hardening PR would be harder to justify than rolling them in. 🔒

Command argv was logged verbatim to the terminal, to the per-run log
file under ``.tox/<env>/log/``, and in ``Outcome.__repr__``, while
environment-variable values have been redacted via ``redact_value``
since #3543. Running a test with a literal ``--token=hunter2`` flag
therefore leaked the value to disk in a location most users never look
at. The secret-detection helpers move into a new ``tox.util.redact``
module and gain a ``redact_argv`` helper; ``ExecuteRequest`` picks up a
``shell_cmd_redacted`` property so the logging paths can emit a masked
form while the existing ``shell_cmd`` property stays stable for the test
helpers that assert on its exact output. Space-separated flag values are
intentionally left alone because there is no generic way to tell a flag
value apart from the next positional without parser-specific knowledge,
and masking ``pytest -k test_token`` would be a nasty false positive.

The PEP 723 runner from #3912 resolved ``script = ...`` as ``tox_root /
script`` with no containment check and no read cap, so a malicious
config pointing at ``../../.ssh/id_rsa`` or a symlink to ``/dev/zero``
would either reach outside the project tree or balloon memory on
``read_text``. ✨ Factoring the lookup into ``_resolve_script_path`` lets
both the existence check in ``_setup_env`` and the read in
``_get_script_metadata`` go through the same ``Path.relative_to`` guard,
and a 5 MiB ceiling bounds the read without affecting any plausible real
script.

The CLI parser used to build a ``Literal[...]`` string from
``action.choices`` and ``eval`` it. Every current callsite passes
hardcoded strings, but it is exactly the kind of pattern a plugin author
could extend with user-controlled choices and regret later.
``typing.Literal`` accepts a tuple directly at runtime, so
``Literal[tuple(action.choices)]`` produces the same type object without
the interpreter round-trip. Finally, the requirements file parser was
vendored from pip in #2009 and lost the session-level timeout pip would
otherwise apply from its ``--timeout`` default; a deliberately slow
mirror could hang a remote ``-r http://...`` include forever, so
``urlopen`` now gets a 30-second timeout.

Existing public imports from ``tox.tox_env.api`` (``redact_value``,
``SECRET_KEYWORDS``, ``SECRET_ENV_VAR_REGEX``) stay available as
re-exports, so downstream plugins pinning against them keep working. The
changelog entry lives at ``docs/changelog/3924.bugfix.rst``.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

logged environment variables are not sorted

4 participants