Skip to content

fix(deploy-on-aws): defusedxml.ElementTree compatibility in fixers#172

Open
leozhad wants to merge 4 commits into
awslabs:mainfrom
leozhad:fix/defusedxml-compat
Open

fix(deploy-on-aws): defusedxml.ElementTree compatibility in fixers#172
leozhad wants to merge 4 commits into
awslabs:mainfrom
leozhad:fix/defusedxml-compat

Conversation

@leozhad
Copy link
Copy Markdown
Member

@leozhad leozhad commented May 20, 2026

Summary

Closes #154
Closes #167

The 4 post-processing fixer scripts in plugins/deploy-on-aws/scripts/lib/ all do import defusedxml.ElementTree as ET and then reach for ET.Element, ET.ElementTree, and ET.indent. None of these names are re-exported by defusedxml.ElementTree, so the scripts crash at import time on Python 3.10+ (eager annotation evaluation) and on systems with older defusedxml (no indent re-export).

Approach

Pull the three names directly from the stdlib at import time and attach them to the ET module object. defusedxml's secure parse() remains the actual XML entry point — none of the symbols added here do any parsing:

  • Element / ElementTree are pure type aliases used in annotations and dict[str, ET.Element]-shaped indexes
  • indent() only mutates the in-memory tree's text / tail whitespace before serialization; no network or external input is involved

Bandit's B405 ("Using Element to parse untrusted XML data is known to be vulnerable to XML attacks") flags the from xml.etree.ElementTree import line. The flagged usage is type-only (no parsing), so each import is annotated with a # nosec B405 comment explaining that defusedxml remains the parser.

Verification

  • All 4 modules import cleanly under Python 3.12:
    python3 -c "import importlib.util; spec=importlib.util.spec_from_file_location('m','<file>'); m=importlib.util.module_from_spec(spec); spec.loader.exec_module(m); print('OK')"
  • All 4 fixers run end-to-end on a real diagram (fix_nesting.py, fix_icon_colors.py, fix_step_badges.py individually, plus post_process_drawio.py chaining them)
  • Output XML re-parses with stdlib ElementTree
  • Bandit (configured per the repo's mise run security:bandit task) reports 0 issues across all 4 files post-fix:
    Total issues (by severity): 0 low / 0 medium / 0 high
    Total potential issues skipped due to specifically being disabled: 4
    
  • Tested against defusedxml==0.7.1 on Python 3.12.12

Files changed

  • plugins/deploy-on-aws/scripts/lib/fix_step_badges.py
  • plugins/deploy-on-aws/scripts/lib/fix_nesting.py
  • plugins/deploy-on-aws/scripts/lib/fix_icon_colors.py
  • plugins/deploy-on-aws/scripts/lib/post_process_drawio.py

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

Closes awslabs#154
Closes awslabs#167

The 4 post-processing fixer scripts in `plugins/deploy-on-aws/scripts/lib/`
all do `import defusedxml.ElementTree as ET` and then reach for
`ET.Element`, `ET.ElementTree`, and `ET.indent`. None of these names
are exported by `defusedxml.ElementTree`:

- `Element` and `ElementTree` are stdlib type aliases. defusedxml's
  module is a thin security wrapper that re-exports the parsing
  helpers (`parse`, `fromstring`) but not the types. With Python 3.10+
  evaluating annotations eagerly, `def f(cell: ET.Element)` crashes at
  module-import time:

      AttributeError: module 'defusedxml.ElementTree' has no attribute
      'Element'

- `indent()` is a write-side pretty-printer added to the stdlib in
  Python 3.9. Older defusedxml releases — still common via system
  Python and distro packages — never re-exported it. Newer scripts
  call `ET.indent(tree, ...)` and crash with the same AttributeError.

Both bugs are filed:
- awslabs#154 (Element / ElementTree) lists all 4 affected files.
- awslabs#167 (indent) reproduces the issue on `fix_step_badges.py:486`.

Fix
----
Pull the three names directly from the stdlib at import time and
attach them to the `ET` module object. defusedxml's secure `parse()`
remains the actual XML entry point — none of the symbols added here
do any parsing:

- `Element` / `ElementTree` are pure type aliases used in annotations
  and `dict[str, ET.Element]`-shaped indexes.
- `indent()` only mutates the in-memory tree's `text`/`tail` whitespace
  before serialization; no network or external input is involved.

Bandit B405 ("Using Element to parse untrusted XML data is known to be
vulnerable to XML attacks") flags the `from xml.etree.ElementTree import`
line. The flagged usage is type-only (no parsing), so each import is
annotated with a `# nosec B405` comment explaining that defusedxml
remains the parser. Verified with `bandit` on all 4 files: 0 issues
post-fix.

Verification
------------
- All 4 modules import cleanly: `python3 -c "import importlib.util;
  spec=importlib.util.spec_from_file_location('m', '<file>'); m =
  importlib.util.module_from_spec(spec); spec.loader.exec_module(m)"`
- All 4 fixers run end-to-end on a real diagram:
  `python3 fix_nesting.py test.drawio` (and same for the other 3).
- `post_process_drawio.py test.drawio` chains all fixers, completes
  successfully, output XML re-parses with stdlib ElementTree.
- `bandit` reports 0 issues across all 4 files (4 nosec annotations
  applied; no other findings introduced).
- Tested against `defusedxml==0.7.1` on Python 3.12.

Files touched
-------------
- plugins/deploy-on-aws/scripts/lib/fix_step_badges.py
- plugins/deploy-on-aws/scripts/lib/fix_nesting.py
- plugins/deploy-on-aws/scripts/lib/fix_icon_colors.py
- plugins/deploy-on-aws/scripts/lib/post_process_drawio.py

----

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
@leozhad leozhad requested review from a team as code owners May 20, 2026 04:05
Comment thread plugins/deploy-on-aws/scripts/lib/fix_icon_colors.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/fix_icon_colors.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/fix_nesting.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/fix_nesting.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/fix_step_badges.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/fix_step_badges.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/post_process_drawio.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/post_process_drawio.py Fixed
Semgrep OSS (the third-party SARIF code-scanning check, separate from the
mise run security:semgrep task that uses an excluded ruleset) flagged
the from xml.etree.ElementTree import line in all 4 files for two rules:

  - python.lang.security.use-defused-xml.use-defused-xml
  - gitlab.bandit.B313...B410 (catch-all stdlib XML import rule)

Both fire on the import statement regardless of how the imported names
are used. Same false-positive class as the bandit B405 finding handled
in the previous commit — the imported symbols are pure type aliases
(Element, ElementTree) and a write-side pretty-printer (indent) that
do no parsing.

Add the nosemgrep directive on the same line as the existing nosec
B405 annotation to suppress both Semgrep rules. defusedxml remains
the actual XML parser.
Comment thread plugins/deploy-on-aws/scripts/lib/fix_icon_colors.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/fix_icon_colors.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/fix_nesting.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/fix_nesting.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/fix_step_badges.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/fix_step_badges.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/post_process_drawio.py Fixed
Comment thread plugins/deploy-on-aws/scripts/lib/post_process_drawio.py Fixed
Comment on lines +22 to +25
from xml.etree.ElementTree import ( # nosec B405
Element as _Element,
ElementTree as _ElementTree,
indent as _indent,
Comment on lines +22 to +25
from xml.etree.ElementTree import ( # nosec B405
Element as _Element,
ElementTree as _ElementTree,
indent as _indent,
Comment on lines +29 to +32
from xml.etree.ElementTree import ( # nosec B405
Element as _Element,
ElementTree as _ElementTree,
indent as _indent,
Comment on lines +29 to +32
from xml.etree.ElementTree import ( # nosec B405
Element as _Element,
ElementTree as _ElementTree,
indent as _indent,
Comment on lines +33 to +36
from xml.etree.ElementTree import ( # nosec B405
Element as _Element,
ElementTree as _ElementTree,
indent as _indent,
Comment on lines +33 to +36
from xml.etree.ElementTree import ( # nosec B405
Element as _Element,
ElementTree as _ElementTree,
indent as _indent,
Comment on lines +28 to +31
from xml.etree.ElementTree import ( # nosec B405
Element as _Element,
ElementTree as _ElementTree,
indent as _indent,
Comment on lines +28 to +31
from xml.etree.ElementTree import ( # nosec B405
Element as _Element,
ElementTree as _ElementTree,
indent as _indent,
@leozhad
Copy link
Copy Markdown
Member Author

leozhad commented May 20, 2026

Note on the Semgrep OSS check failure

This SaaS check fires on:

  • python.lang.security.use-defused-xml.use-defused-xml
  • gitlab.bandit.B313.B314.B315.B316.B318.B319.B320.B405.B406.B407.B408.B409.B410

against the from xml.etree.ElementTree import line in all 4 files (4 files × 2 rules = 8 findings). Both rules flag any import from xml.etree.ElementTree, regardless of usage.

The flagged usage in this PR is type-onlyElement and ElementTree are imported solely as type aliases for annotations like dict[str, ET.Element], and indent is a write-side pretty-printer that mutates only the in-memory tree's whitespace. None of the imported names do any parsing. defusedxml.parse() remains the actual XML entry point in every file.

The in-tree mise run security:semgrep task passes (it uses the repo's configured exclusion list for known false positives).

I tried to suppress just these two rules so the SARIF check would pass:

  1. Inline # nosemgrep: <rule-ids> on the import line — ignored
  2. Line-prefix # nosemgrep: <rule-ids> directives above the import — ignored
  3. Bare # nosemgrep (suppress all rules on next line) — ignored (9c84ed7)

The cloud-hosted Semgrep checker appears to honor neither inline nor prefix directives, regardless of rule ID specificity. I think this needs maintainer intervention — either a repo-level Semgrep config exception for these rules on the 4 fixer files, or a Semgrep OSS override in the GitHub branch protection settings.

Bandit (the in-tree Python scan) passes cleanly with the # nosec B405 annotations: 4 nosec skips, 0 issues. The two suppression mechanisms are intentionally separate because they target different scanners — bandit honors nosec, but the cloud Semgrep doesn't honor nosemgrep.

Happy to apply any specific shape of fix you'd prefer (e.g., refactor to xml.etree.ElementTree as _xet aliasing if that gets past the rule pattern, or split the type imports into a separate file marked # semgrep-disable-file if that's the convention).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants