Scan History and Persistence
Scan History and Persistence
The following files were used as context for generating this wiki page:
- .github/workflows/ghcr-publish.yml
- .gitignore
- LICENSE
- README.md
- RELEASE_3.0.0.md
- RELEASE_SUMMARY.md
- docs/V3_COMPLETE_GUIDE.md
Purpose and Scope
This document describes WSHawk's Infrastructure Persistence Plane, specifically the SQLite-backed scan history system that provides zero-loss data persistence for all security assessments. This page covers the database architecture, storage mechanisms, Write-Ahead Logging (WAL) mode for crash recovery, and the web dashboard's scan history management interface.
For information about launching the web dashboard and authentication, see Dashboard Overview and Launch. For REST API programmatic access to scan history, see REST API Reference. For broader architectural context, see Infrastructure Persistence Layer.
SQLite Database Architecture
WSHawk v3.0.0 replaces memory-resident data structures with a persistent SQLite database that survives crashes, system reboots, and scanner terminations. The database implements a "zero-loss persistence" design philosophy where security data is never ephemeral.
Storage Location
The database file is stored at one of the following locations, depending on deployment mode:
| Deployment Mode | Database Path | Additional Files |
|----------------|---------------|------------------|
| Local Development | ./scans.db | scans.db-wal, scans.db-shm |
| Production/Docker | ~/.wshawk/scans.db | ~/.wshawk/scans.db-wal, ~/.wshawk/scans.db-shm |
| Docker Volume Mount | /app/.wshawk/scans.db | /app/.wshawk/scans.db-wal, /app/.wshawk/scans.db-shm |
The .gitignore file explicitly excludes these database files from version control: .gitignore:90-93
Sources: README.md:132, RELEASE_SUMMARY.md:17, .gitignore:90-93
Database Schema and Stored Data
The SQLite database maintains three primary data categories for comprehensive security audit trails:
erDiagram
SCANS ||--o{ VULNERABILITIES : contains
SCANS ||--o{ TRAFFIC_LOGS : records
SCANS ||--o{ SCAN_METRICS : tracks
SCANS {
integer id PK
string target_url
datetime start_time
datetime end_time
string scan_type
string status
string report_path
}
VULNERABILITIES {
integer id PK
integer scan_id FK
string vuln_type
string severity
float cvss_score
string cvss_vector
text payload
text response
text evidence
string confidence
}
TRAFFIC_LOGS {
integer id PK
integer scan_id FK
datetime timestamp
string direction
text message_content
integer frame_size
}
SCAN_METRICS {
integer id PK
integer scan_id FK
integer total_payloads
integer messages_sent
integer messages_received
float avg_rps
integer connection_errors
}
Data Categories
1. Scan Metadata
- Target WebSocket URL
- Start/end timestamps
- Scan type (quick, advanced, defensive)
- Final status (completed, failed, interrupted)
- Generated report file path
2. Vulnerability Findings
- Vulnerability type (SQL Injection, XSS, XXE, etc.)
- CVSS v3.1 score and vector string
- Proof-of-concept payload
- Server response that confirmed exploitation
- Evidence artifacts (screenshots for XSS, OAST callbacks for XXE/SSRF)
- Confidence level (LOW/MEDIUM/HIGH)
3. Traffic Logs
- Every WebSocket frame sent and received
- Timestamps for temporal analysis
- Message direction (client→server, server→client)
- Frame size for bandwidth analysis
4. Performance Metrics
- Total payloads tested
- Average requests per second (RPS)
- Connection stability metrics
- Error rate tracking
Sources: docs/V3_COMPLETE_GUIDE.md:293-297, RELEASE_SUMMARY.md:17
WAL Mode and Zero-Loss Design
WSHawk configures SQLite to use Write-Ahead Logging (WAL) mode, a critical feature for high-throughput scanning operations that ensures data integrity even during unexpected terminations.
Write-Ahead Logging Mechanism
sequenceDiagram
participant Scanner as "WSHawkV2 Scanner"
participant WAL as "scans.db-wal"
participant DB as "scans.db"
participant SHM as "scans.db-shm (Shared Memory)"
Scanner->>WAL: "Write vulnerability finding"
Note over WAL: "Append-only log<br/>(no blocking)"
WAL-->>Scanner: "Write acknowledged"
Scanner->>WAL: "Write traffic log"
WAL-->>Scanner: "Write acknowledged"
Note over WAL,SHM: "Checkpoint threshold reached"
WAL->>DB: "Commit WAL pages to main DB"
WAL->>SHM: "Update shared memory index"
Note over Scanner: "Scanner crashes mid-scan"
Note over WAL: "On restart: WAL intact"
WAL->>DB: "Replay uncommitted pages"
Note over DB: "Data recovered successfully"
Crash Recovery Behavior
When WSHawk is terminated unexpectedly (kill signal, system crash, power loss):
- WAL Preservation: All writes committed to
scans.db-walare preserved on disk - Automatic Recovery: On next launch, SQLite automatically replays the WAL
- Transaction Integrity: Only complete transactions are recovered; partial writes are discarded
- No Data Loss: All acknowledged vulnerability findings and traffic logs are restored
This architecture ensures that even if WSHawk is forcibly killed during a 10-hour scan, all data up to the last successful database write is preserved.
Sources: docs/V3_COMPLETE_GUIDE.md:124, RELEASE_SUMMARY.md:3
Scan Lifecycle and Persistence Flow
The following diagram shows how scan data flows from the scanner engine through to persistent storage:
graph TB
subgraph "WSHawkV2 Scanner Execution"
Init["Scanner Initialization<br/>scanner_v2.WSHawkV2"]
Connect["WebSocket Connection<br/>ws.connect()"]
Inject["Payload Injection Loop<br/>send_payload()"]
Analyze["VulnerabilityVerifier<br/>analyze_response()"]
end
subgraph "Database Persistence Layer"
CreateScan["Create Scan Record<br/>INSERT INTO scans<br/>status='running'"]
LogTraffic["Log Traffic Frame<br/>INSERT INTO traffic_logs<br/>timestamp, direction, content"]
StoreVuln["Store Vulnerability<br/>INSERT INTO vulnerabilities<br/>cvss_score, payload, evidence"]
UpdateMetrics["Update Metrics<br/>UPDATE scan_metrics<br/>total_payloads, avg_rps"]
FinalizeScan["Finalize Scan<br/>UPDATE scans<br/>status='completed', end_time"]
end
subgraph "File System Storage"
GenReport["Generate HTML Report<br/>wshawk_report_*.html"]
SaveScreenshot["Save Screenshots<br/>xss_evidence_*.png"]
StoreReportPath["UPDATE scans<br/>SET report_path=?"]
end
subgraph "SQLite Database Files"
WAL["scans.db-wal<br/>Write-Ahead Log"]
DB["scans.db<br/>Main Database"]
SHM["scans.db-shm<br/>Shared Memory Index"]
end
Init --> CreateScan
CreateScan --> WAL
Connect --> Inject
Inject --> LogTraffic
LogTraffic --> WAL
Inject --> Analyze
Analyze --> StoreVuln
StoreVuln --> WAL
Inject --> UpdateMetrics
UpdateMetrics --> WAL
Analyze --> GenReport
GenReport --> SaveScreenshot
SaveScreenshot --> StoreReportPath
StoreReportPath --> WAL
WAL --> DB
WAL --> SHM
Inject --> FinalizeScan
FinalizeScan --> WAL
Persistence Workflow
- Scan Initialization: When
WSHawkV2.run_heuristic_scan()starts, a new record is inserted into thescanstable withstatus='running' - Real-Time Logging: Every WebSocket frame (send/receive) is immediately written to
traffic_logsvia WAL - Vulnerability Recording: When
VulnerabilityVerifierconfirms a finding, it's written tovulnerabilitieswith full CVSS scoring - Metric Tracking: Performance counters (RPS, payload counts) are periodically flushed to
scan_metrics - Report Linking: Generated HTML reports and screenshot files are linked back to the scan record
- Finalization: On scan completion, the status is updated to
completedwith an end timestamp
Sources: docs/V3_COMPLETE_GUIDE.md:115-120, RELEASE_SUMMARY.md:16-19
Web Dashboard Scan History Interface
The web management dashboard provides a visual interface for browsing, filtering, and managing historical scans.
Dashboard Storage Architecture
graph LR
subgraph "Browser Client"
UI["Web Dashboard UI<br/>wshawk/web/templates/"]
JS["JavaScript<br/>Real-time Updates"]
end
subgraph "Flask Web Server"
Routes["Flask Routes<br/>@app.route('/api/scans')"]
Auth["Authentication Middleware<br/>SHA-256 Password Check"]
Query["Database Query Layer<br/>SELECT FROM scans"]
end
subgraph "SQLite Backend"
ScanTable["scans table"]
VulnTable["vulnerabilities table"]
Indexes["Indexes:<br/>idx_scan_timestamp<br/>idx_vuln_severity"]
end
UI --> JS
JS -->|"HTTP GET /api/scans"| Routes
Routes --> Auth
Auth --> Query
Query --> ScanTable
Query --> VulnTable
Query --> Indexes
ScanTable --> Query
VulnTable --> Query
Query --> Routes
Routes -->|"JSON Response"| JS
JS --> UI
Visual Progress Tracking
The dashboard provides real-time visibility into scan execution:
| Progress Indicator | Data Source | Update Frequency |
|-------------------|-------------|------------------|
| Scan Status | scans.status column | Real-time (WebSocket) |
| Payloads Tested | scan_metrics.total_payloads | Every 10 payloads |
| Vulnerabilities Found | COUNT(vulnerabilities) | On each finding |
| Current RPS | scan_metrics.avg_rps | Every 5 seconds |
| Connection Health | scan_metrics.connection_errors | On each error |
The dashboard uses JavaScript polling or WebSocket updates to refresh these metrics without page reload, providing a "live view of the scan brain" as described in docs/V3_COMPLETE_GUIDE.md:305.
Sources: README.md:133, docs/V3_COMPLETE_GUIDE.md:305
Report Management and Retention
Report File System Structure
graph TB
subgraph "Report Storage Directory"
ReportsDir["./reports/ or ~/.wshawk/reports/"]
HTMLReports["HTML Reports<br/>wshawk_report_YYYYMMDD_HHMMSS.html"]
Screenshots["Screenshots<br/>xss_evidence_<scan_id>_<vuln_id>.png"]
TrafficLogs["Traffic Dumps<br/>traffic_<scan_id>.json"]
SARIFExports["SARIF Exports<br/>wshawk_<scan_id>.sarif"]
end
subgraph "Database References"
ScanRecord["scans.report_path<br/>'./reports/wshawk_report_*.html'"]
VulnRecord["vulnerabilities.evidence<br/>'./reports/xss_evidence_*.png'"]
end
ReportsDir --> HTMLReports
ReportsDir --> Screenshots
ReportsDir --> TrafficLogs
ReportsDir --> SARIFExports
HTMLReports -.->|"referenced by"| ScanRecord
Screenshots -.->|"referenced by"| VulnRecord
Interactive Report Management
The web dashboard provides operations on historical scans:
View Operations:
- List All Scans: Paginated view with filtering by date, target, status, severity
- Scan Details: Drill-down into individual scan showing all findings, traffic logs, metrics
- Vulnerability Timeline: Chronological view of when each vulnerability was discovered during the scan
Management Operations:
- Delete Scan: Removes database record and associated report files
- Re-export: Regenerate report in different format (JSON, CSV, SARIF)
- Compare Scans: Side-by-side comparison of two scans against the same target to track remediation
Data Retention: By default, WSHawk retains all scan history indefinitely. For production deployments with disk constraints, administrators can configure automatic cleanup policies:
# Example wshawk.yaml configuration (not included by default)
persistence:
retention_days: 90 # Delete scans older than 90 days
max_scans: 1000 # Keep only most recent 1000 scans
auto_cleanup: true # Enable automatic cleanup on dashboard startup
Sources: README.md:134, .gitignore:50-53
Historical Comparison and Regression Detection
WSHawk's persistent history enables security teams to track remediation progress over time.
Comparison Workflow
sequenceDiagram
participant User as "Security Analyst"
participant Dashboard as "Web Dashboard"
participant DB as "scans.db"
participant Diff as "Comparison Engine"
User->>Dashboard: "Select Scan A (Baseline)"
Dashboard->>DB: "SELECT * FROM scans WHERE id=A"
DB-->>Dashboard: "Scan A data (10 vulns)"
User->>Dashboard: "Select Scan B (Current)"
Dashboard->>DB: "SELECT * FROM scans WHERE id=B"
DB-->>Dashboard: "Scan B data (3 vulns)"
Dashboard->>Diff: "Compare(A, B)"
Diff->>Diff: "Identify Fixed: 7 vulns"
Diff->>Diff: "Identify Persistent: 3 vulns"
Diff->>Diff: "Identify New: 0 vulns"
Diff-->>Dashboard: "Comparison Report"
Dashboard-->>User: "Display: 7 Fixed, 3 Persistent, 0 New"
Regression Detection
When comparing two scans of the same target:
- Fixed Vulnerabilities: Present in baseline scan but absent in current scan (indicates successful remediation)
- Persistent Vulnerabilities: Present in both scans with identical payloads (indicates incomplete remediation)
- New Vulnerabilities: Absent in baseline but present in current scan (indicates regression or new attack surface)
- Changed Exploitability: Same vulnerability type but different CVSS score (indicates partial fix or changed conditions)
This historical analysis is critical for tracking security posture improvements and identifying regressions introduced by code changes.
Sources: docs/V3_COMPLETE_GUIDE.md:307
Database Performance and Optimization
Indexing Strategy
To support fast queries on large scan histories, WSHawk creates indexes on high-cardinality columns:
-- Conceptual indexes (actual implementation in scanner code)
CREATE INDEX idx_scan_timestamp ON scans(start_time DESC);
CREATE INDEX idx_scan_target ON scans(target_url);
CREATE INDEX idx_vuln_severity ON vulnerabilities(severity, cvss_score DESC);
CREATE INDEX idx_vuln_type ON vulnerabilities(vuln_type);
CREATE INDEX idx_traffic_scan ON traffic_logs(scan_id, timestamp);
Query Optimization
Common dashboard queries are optimized for performance:
| Query Type | Optimization | Expected Performance |
|-----------|--------------|---------------------|
| Recent Scans List | idx_scan_timestamp descending | < 50ms for 10,000 scans |
| Vulnerability by Severity | idx_vuln_severity with covering index | < 100ms for 50,000 findings |
| Target Scan History | idx_scan_target with JOIN | < 200ms for 1,000 target scans |
| Traffic Log Replay | idx_traffic_scan sequential read | Streaming, no memory limit |
Disk Space Considerations
The SQLite database grows based on scan activity:
- Typical Scan: 2-5 MB (22,000 payloads, 1 hour scan)
- With Full Traffic Logging: 20-50 MB (every frame recorded)
- With Screenshots: +5 MB per XSS finding (Playwright evidence)
- Annual Production Usage: ~50 GB (daily scans, 1 year retention)
For long-term deployments, consider implementing retention policies or archiving old scans to separate databases.
Sources: docs/V3_COMPLETE_GUIDE.md:293-297
Docker Volume Persistence
For containerized deployments, proper volume mounting ensures scan history survives container recreation.
Volume Mount Configuration
# Recommended Docker run command for persistence
docker run --rm \
-v wshawk_data:/app/.wshawk \
-v wshawk_reports:/app/reports \
rothackers/wshawk --web --host 0.0.0.0
Docker Compose Configuration
# Example docker-compose.yml excerpt
services:
wshawk:
image: rothackers/wshawk:latest
volumes:
- wshawk_data:/app/.wshawk # Database persistence
- wshawk_reports:/app/reports # Report file persistence
environment:
- WSHAWK_WEB_PASSWORD=${WEB_PASSWORD}
volumes:
wshawk_data:
driver: local
wshawk_reports:
driver: local
Without volume mounts, the database and reports are lost when the container stops. The volume approach ensures that:
scans.dbpersists across container restarts- WAL files (
scans.db-wal,scans.db-shm) are properly maintained - Report HTML files and screenshots remain accessible
- Multiple container instances can share the same data volume (with appropriate locking)
Sources: README.md:64-78
Summary
WSHawk's Infrastructure Persistence Plane provides enterprise-grade data durability through:
- SQLite with WAL Mode: Zero-loss persistence even during crashes
- Comprehensive Schema: Stores scan metadata, vulnerabilities, traffic logs, and performance metrics
- Web Dashboard Interface: Visual scan history browsing and management
- Historical Analysis: Comparison and regression detection across scans
- Production-Ready: Optimized indexes, volume persistence, and retention policies
This architecture transforms WSHawk from a one-time scanning tool into a persistent security monitoring platform suitable for continuous assessment and long-term security posture tracking.
Sources: README.md:23, README.md:112-136, RELEASE_SUMMARY.md:15-19, docs/V3_COMPLETE_GUIDE.md:122-125, docs/V3_COMPLETE_GUIDE.md:289-307