Skip to content

fix: sanitize page-provided tool strings before they reach model context#2247

Open
serhiizghama wants to merge 2 commits into
ChromeDevTools:mainfrom
serhiizghama:fix/sanitize-page-provided-tool-strings
Open

fix: sanitize page-provided tool strings before they reach model context#2247
serhiizghama wants to merge 2 commits into
ChromeDevTools:mainfrom
serhiizghama:fix/sanitize-page-provided-tool-strings

Conversation

@serhiizghama

Copy link
Copy Markdown
Contributor

Problem

Fixes #2242.

When list_3p_developer_tools runs, McpResponse.getToolGroups() collects tool groups from page JavaScript via the devtoolstooldiscovery event. The name/description of each group and tool are page-provided strings, and they are interpolated directly into the response text the model reads with no sanitization beyond a typeof === 'string' check.

A page being browsed can respond with strings containing:

  • newlines — to inject a fake ## System section into the structured response,
  • C0/C1 control characters, or
  • Unicode bidirectional overrides — to reorder/obfuscate text in the model's context.

This is gated behind categoryExperimentalThirdParty, but for any user who has enabled that category, a visited page can inject content into the LLM context.

Solution

Add sanitizePageString() and apply it to group.name, group.description, tool.name, and tool.description in getToolGroups(), in the Node-side loop that already runs after the values cross the Puppeteer boundary (in-page code is untrusted, so sanitizing there rather than inside evaluate() is deliberate). It:

  • strips C0/C1 control characters and Unicode bidirectional formatting characters (U+200E/200F, U+202A–202E, U+2066–2069), and
  • collapses any remaining whitespace (newlines, tabs, line/paragraph separators) to single spaces, then trims.

Sanitizing at the boundary means both the rendered text and structuredContent.thirdPartyDeveloperTools are clean, so the fix covers every downstream consumer.

Testing

Added a test in tests/tools/thirdPartyDeveloper.test.ts that has a page respond with a tool group whose strings contain a right-to-left override, a control character, and a newline-led ## System block, then asserts those characters are removed and the injected section is collapsed onto a single line in both structuredContent and the response text. Verified it fails on main and passes with the fix; the existing third-party and McpResponse suites stay green.

Tool group and tool name/description strings returned by page JavaScript
through the devtoolstooldiscovery event were interpolated into the
list_3p_developer_tools response with only a typeof check. A page could
respond with newlines, C0/C1 control characters, or Unicode bidirectional
overrides to inject fake sections or reorder text in the model's context.

Strip control and bidi characters and collapse whitespace on the Node
side, after the values cross the Puppeteer boundary, since in-page code is
untrusted.
@Lightning00Blade Lightning00Blade requested a review from wolfib June 23, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

devtoolstooldiscovery: page-controlled strings reach LLM context without sanitization (prompt injection surface)

1 participant