DOM Invader: Headless XSS Verification

The following files were used as context for generating this wiki page:

Overview

The DOM Invader is a headless browser execution engine introduced in WSHawk v3.0.3. It solves the most persistent false positive problem in WebSocket fuzzing: a payload appearing in a server response does not mean it will execute as JavaScript in a real browser. The DOM Invader renders every WebSocket response inside an instrumented headless Chromium page and directly observes whether JavaScript execution occurs.

For information about the Payload Blaster that feeds responses into the DOM Invader, see Desktop Advanced Features. For information about recording authentication sessions to keep long-running fuzz runs alive, see Auth Flow Recording and Replay.

The False Positive Problem

Traditional XSS detection in WebSocket scanners relies on checking whether a payload appears verbatim in the server response. This produces both false positives and false negatives:

| Scenario | Pattern Match Result | Actual Execution | |----------|---------------------|-----------------| | Payload reflected with HTML encoding | False positive — match triggers | Does not execute | | Payload inside a <textarea> element | False positive — match triggers | Does not execute | | Response protected by strict CSP | False positive — match triggers | Does not execute | | Payload in an event handler attribute | May not match pattern | Executes on interaction | | Payload routed through a DOM sink | May not match | Executes silently |

The only definitive test is execution in a real browser engine. The DOM Invader does not attempt to parse or classify responses by syntax — it loads them into Chromium and observes the result.

Sources: dom_invader.py: XSSVerifier class, docs/V3.0.3_RELEASE_GUIDE.md: Section 3

Architecture

The DOM Invader is structured as three independent classes coordinated by a single orchestrator. All four components live in wshawk/dom_invader.py.

graph TB
    subgraph "dom_invader.py"
        DI["DOMInvader (orchestrator)"]
        BP["BrowserPool"]
        XV["XSSVerifier"]
        AFR["AuthFlowRecorder"]
    end

    subgraph "gui_bridge.py"
        Routes["/dom/status\n/dom/verify\n/dom/verify/batch\n/dom/auth/record\n/dom/auth/replay"]
    end

    subgraph "renderer.js"
        UI["DOMInvaderUI module\nStatus pill\nRecord Auth Flow button\nDOM Verified column"]
    end

    Routes --> DI
    DI --> BP
    DI --> XV
    DI --> AFR
    UI --> Routes

Singleton Pattern:

gui_bridge.py maintains a single DOMInvader instance via a lazy-initialization helper _get_dom_invader(). The singleton is created on the first request to any /dom/* route and persists for the lifetime of the backend process.

Sources: gui_bridge.py: _get_dom_invader(), dom_invader.py: DOMInvader class

BrowserPool

Purpose

Launching a Chromium browser instance takes 800ms to 1.5 seconds depending on the system. Creating a fresh browser for every payload verification would reduce the effective fuzzing rate to less than one payload per second. The BrowserPool maintains a collection of pre-warmed BrowserContext objects that are cleared between uses rather than destroyed.

API

| Method | Parameters | Return | Description | |--------|-----------|--------|-------------| | start() | headless=True, max_contexts=4 | None | Launches Playwright, starts Chromium | | get_context() | — | BrowserContext | Returns an available context, blocking if pool is full | | release_context(ctx) | BrowserContext | None | Clears cookies, closes pages, returns context to pool | | shutdown() | — | None | Closes all contexts and the browser |

Context Isolation

Before a context is returned to the pool, release_context() performs:

Closes all open pages within the context.
Clears all cookies via context.clear_cookies().
If the context raises an exception during cleanup, it is closed and discarded rather than returned.

This ensures that the response content, cookies, and JavaScript state from one verification cannot contaminate the next.

Concurrency

get_context() is backed by an asyncio.Semaphore set to the pool's maximum size. Callers that exceed the pool size block until a context becomes available. This prevents unbounded tab creation under high concurrency.

Sources: dom_invader.py: BrowserPool class

XSSVerifier

Instrumented Page Architecture

The XSSVerifier builds a fixed HTML test page for each verification:

<!DOCTYPE html>
<html>
<head></head>
<body>
    <div id="ws-response">{server_response_content}</div>
    <script data-wshawk="instrumentation">
        window.__xss_executed = false;
        window.__xss_evidence = "";
        window.__mutation_count = 0;

        const _alert = window.alert;
        window.alert = function(msg) {
            window.__xss_executed = true;
            window.__xss_evidence = "Dialog triggered: " + msg;
            _alert(msg);
        };

        const _eval = window.eval;
        window.eval = function(code) {
            window.__xss_executed = true;
            window.__xss_evidence = "eval() called";
            return _eval(code);
        };

        new MutationObserver((mutations) => {
            window.__mutation_count += mutations.length;
        }).observe(document.getElementById("ws-response"), {
            childList: true, subtree: true, attributes: true
        });
    </script>
</body>
</html>

The data-wshawk attribute prevents the MutationObserver from flagging the instrumentation script itself as a DOM injection.

Execution Detection Hierarchy

Results are classified in priority order. The first positive indicator found determines the classification:

| Priority | Indicator | Classification | Confidence | |----------|-----------|---------------|-----------| | 1 | Playwright dialog event captured | reflected | Definitive | | 2 | window.__xss_executed set by alert/eval override | dom_based | Definitive | | 3 | Console message matching beacon token | dom_based | High | | 4 | Script element count in #ws-response > 0 | mutation | Medium | | 5 | Event handler attribute count > 0 | mutation | Medium | | None | No indicators within timeout | Not executed | — |

verify() Method Workflow

flowchart TD
    Start["verify(payload, response, timeout_ms)"]
    GetCtx["BrowserPool.get_context()"]
    BuildPage["Construct instrumented HTML page"]
    SetContent["page.set_content(html)"]
    Wait["asyncio.sleep(timeout_ms / 1000)"]
    Eval["page.evaluate() — read execution state"]
    Check{{"Execution detected?"}}
    Yes["Return VerifyResult(executed=True, evidence=...)"]
    No["Return VerifyResult(executed=False)"]
    Release["BrowserPool.release_context(ctx)"]

    Start --> GetCtx
    GetCtx --> BuildPage
    BuildPage --> SetContent
    SetContent --> Wait
    Wait --> Eval
    Eval --> Check
    Check -->|Yes| Yes
    Check -->|No| No
    Yes --> Release
    No --> Release

VerifyResult Fields

| Field | Type | Description | |-------|------|-------------| | executed | bool | Whether JavaScript execution was confirmed | | evidence | str | Human-readable description of what triggered confirmation | | technique | str | One of: reflected, dom_based, mutation, stored | | alert_message | str | Message passed to alert() if dialog triggered | | dom_mutations | int | Count of MutationObserver events | | injected_scripts | int | Number of <script> elements injected into the response container | | injected_handlers | int | Number of event handler attributes detected | | elapsed_ms | float | Time taken for the verification |

Sources: dom_invader.py: XSSVerifier class

Auth Flow Recording and Replay

Problem Statement

A typical Payload Blaster session against an authenticated WebSocket endpoint sends between 5,000 and 50,000 payloads depending on the selected word list. At 300ms per payload, a full XSS word list takes approximately 25 minutes. Most OAuth and SSO providers issue access tokens that expire in 15 to 60 minutes. When the token expires, the server closes the WebSocket connection and the Blaster terminates.

The AuthFlowRecorder solves this by letting the user complete a login exactly once — manually, in a visible browser — and then replaying that session automatically every time a token expires.

Recording Phase

sequenceDiagram
    participant User
    participant Desktop as WSHawk Desktop
    participant Backend as gui_bridge.py
    participant Recorder as AuthFlowRecorder
    participant Browser as Visible Chromium

    User->>Desktop: Click "Record Auth Flow"
    Desktop->>Backend: POST /dom/auth/record {login_url, timeout_s}
    Backend->>Recorder: record(login_url)
    Recorder->>Browser: Launch visible browser (headless=False)
    Browser->>User: Displays login page
    User->>Browser: Completes login (SSO, MFA, OAuth consent)
    Browser->>Recorder: Network events: cookies, JSON tokens, headers
    Recorder->>Backend: AuthFlow object
    Backend->>Desktop: {flow: {cookies, extracted_tokens, ws_headers}}
    Desktop->>Desktop: Store in window._domInvaderAuthFlow

Captured Material:

Set-Cookie response headers parsed individually.
JSON response bodies inspected for keys: token, access_token, jwt, session_token, auth_token, sessionId.
Authorization headers on outgoing requests from the authenticated session.
Final localStorage contents.

Replay Phase

The replay uses a context from the BrowserPool and operates headlessly. Captured cookies are injected via context.add_cookies(). The result is an AuthTokens object:

| Field | Type | Description | |-------|------|-------------| | cookies | Dict[str, str] | Cookie name → value map | | headers | Dict[str, str] | Ready-to-use as WebSocket extra_headers | | session_token | str | Primary token string | | valid | bool | Whether the replay produced usable credentials |

Auto-Reconnect in the Blaster

When the Blaster receives a ConnectionClosed exception mid-fuzz and an auth flow is saved:

Rewinds the payload index by one so the interrupted payload is retried.
Emits a SESSION_EXPIRED info row to the results table.
Calls replay_auth_flow() to obtain fresh headers.
Waits 1 second for the server to close the expired session.
Opens a new WebSocket connection with the refreshed headers.
Resumes from the rewound payload position.

This process repeats up to 3 times before a fatal error is raised.

Sources: dom_invader.py: AuthFlowRecorder class, gui_bridge.py: run_blaster_task()

REST API Reference

All five DOM Invader routes are served by gui_bridge.py on port 8080 alongside the rest of the backend API.

GET /dom/status

Returns the current state of the DOM Invader engine. The frontend calls this on every Blaster tab open to update the status pill.

Response:

{
  "status": "success",
  "playwright_installed": true,
  "browser_running": true,
  "contexts_available": 3,
  "contexts_in_use": 1,
  "auth_flow_saved": false
}

If playwright_installed is false, all other DOM endpoints return {"status": "error", "msg": "Playwright not installed"}.

POST /dom/verify

Verifies a single payload/response pair.

Request:

{
  "payload": "<script>alert(1)</script>",
  "response": "<html>...<script>alert(1)</script>...</html>",
  "timeout_ms": 3000
}

Response:

{
  "status": "success",
  "executed": true,
  "evidence": "Dialog triggered: 1",
  "technique": "reflected",
  "alert_message": "1",
  "dom_mutations": 1,
  "injected_scripts": 1,
  "injected_handlers": 0,
  "elapsed_ms": 1243.7
}

POST /dom/verify/batch

Verifies multiple results concurrently (up to 3 simultaneous verifications).

Request:

{
  "results": [
    {"payload": "...", "response": "..."},
    {"payload": "...", "response": "..."}
  ],
  "timeout_ms": 3000
}

POST /dom/auth/record

Starts an interactive recording session. This call blocks until the visible browser is closed or the timeout expires. The client should display a loading indicator.

Request:

{
  "login_url": "https://app.example.com/login",
  "target_ws_url": "wss://app.example.com/socket",
  "timeout_s": 120
}

POST /dom/auth/replay

Replays the saved auth flow to obtain fresh tokens.

Request: {} (uses saved flow) or {"flow": {...}} to replay an explicitly provided flow.

Sources: gui_bridge.py: /dom/* routes

Frontend Integration

Status Pill

The DOMInvaderUI JavaScript module in renderer.js calls GET /dom/status on initialization and on every Blaster tab open. The pill updates without page reload.

| State | CSS Class | Label | |-------|-----------|-------| | Playwright installed, browser running | dom-status-available | Playwright Ready | | Playwright not installed | dom-status-unavailable | Not Installed | | Backend not reachable | dom-status-unknown | Offline |

DOM Verified Column

The Blaster results table has a sixth column: DOM Verified. Badge states:

| State | Appearance | Trigger | |-------|-----------|---------| | — (dash) | No badge | dom_verify was not enabled | | Verifying... | Amber badge | Verification in progress | | CONFIRMED XSS | Pulsing red badge | executed: true from /dom/verify | | Unverified | Muted badge | executed: false within timeout |

Hovering any badge shows the dom_evidence string from the backend as a tooltip.

Socket.IO Events

| Event | Direction | Payload | Description | |-------|-----------|---------|-------------| | blaster_result | Server → Client | Includes dom_verified, dom_evidence, dom_technique | Standard result event with DOM fields added | | dom_xss_confirmed | Server → Client | {payload, url, evidence, technique} | Fired only when executed: true — triggers critical log entry |

Configuration and Prerequisites

Installing Playwright

pip install playwright
playwright install chromium

On headless CI servers, Playwright also needs the following system libraries (Debian/Ubuntu):

apt-get install -y libnss3 libatk1.0-0 libatk-bridge2.0-0 \
    libcups2 libgtk-3-0 libasound2 libxss1 libgbm1

Chromium Launch Flags

The BrowserPool launches Chromium with these flags in all environments:

--no-sandbox
--disable-setuid-sandbox
--disable-dev-shm-usage
--disable-gpu

--no-sandbox is required when running as root, which is standard in Docker.

Auth Flow Recording Requirements

Recording uses headless=False and requires a display server. It cannot be used in headless CI environments or SSH sessions without X11 forwarding. Verification (headless=True) works without a display.

Performance Reference

| Operation | Typical Duration | Memory | |-----------|-----------------|--------| | Initial BrowserPool.start() | 1.2 – 2 seconds | 120 – 180 MB | | Per-payload verify() (context reuse) | 1 – 3.5 seconds | +8 MB per page | | Batch verify (3 concurrent) | 2 – 5 seconds for 3 results | +24 MB | | Auth flow replay (headless) | 3 – 8 seconds | +40 MB during replay | | BrowserPool.shutdown() | < 1 second | Freed on exit |

Sources: docs/V3.0.3_RELEASE_GUIDE.md: Section 10

Troubleshooting

Status pill shows "Not Installed" but Playwright is installed.

Playwright must be installed in the same Python environment as the WSHawk backend. If using the packaged binary (wshawk-bridge), Playwright must have been installed in the build environment and included in the PyInstaller bundle.

Auth replay returns valid: false with empty headers.

The replay injects captured cookies and replays navigation steps. If the application requires completing JavaScript-driven login steps that the recorder did not capture, the replay will not produce a valid session. Fall back to the auth_payload field in the Blaster, which sends a raw WebSocket authentication frame after the connection opens.

Verification times out on valid HTML responses.

Increase timeout_ms in the /dom/verify request. React and Angular applications may need 5000ms or more to complete their initialization before injected content is evaluated.

The recording browser closes immediately.

Increase timeout_s in the /dom/auth/record request. The default is 120 seconds. Complex SSO flows with multiple redirects and email verification steps may require 180 – 300 seconds.

Summary

The DOM Invader eliminates the primary weakness of WebSocket XSS detection by replacing reflection-based heuristics with direct observation of JavaScript execution in a real browser runtime. Combined with the Auth Flow Recorder, it enables continuous, authenticated fuzzing sessions that automatically survive token expiry — making it practical to run overnight word lists against production-grade authenticated endpoints.

Sources: dom_invader.py, gui_bridge.py, docs/V3.0.3_RELEASE_GUIDE.md