Data Flow Architecture¶
This document details how data flows through the RapidTriageME system, from browser events to AI assistant responses. Understanding these patterns is crucial for debugging, optimization, and extending the platform.
Overview¶
RapidTriageME processes data through multiple interconnected flows:
graph TB
subgraph "Data Sources"
BROWSER[Browser Events]
USER[User Actions]
PAGE[Page Changes]
end
subgraph "Collection Layer"
EXT[Chrome Extension]
DEVTOOLS[DevTools API]
CONTENT[Content Scripts]
end
subgraph "Processing Layer"
WS[WebSocket Handler]
PROC[Data Processor]
CACHE[Cache Manager]
FILTER[Data Filter]
end
subgraph "Integration Layer"
REST[REST API]
MCP[MCP Server]
LH[Lighthouse Runner]
end
subgraph "Output Layer"
AI[AI Assistant]
DASH[Dashboard]
WEBHOOK[Webhooks]
end
BROWSER --> EXT
USER --> EXT
PAGE --> DEVTOOLS
EXT --> WS
DEVTOOLS --> CONTENT
CONTENT --> WS
WS --> PROC
PROC --> CACHE
PROC --> FILTER
CACHE --> REST
REST --> MCP
REST --> LH
MCP --> AI
REST --> DASH
REST --> WEBHOOK
style BROWSER fill:#e3f2fd
style EXT fill:#e1f5fe
style PROC fill:#e8f5e8
style MCP fill:#fff3e0
style AI fill:#f1f8e9
Real-Time Data Flows¶
Console Log Flow¶
Console messages flow from browser to AI assistant through this pipeline:
sequenceDiagram
participant JS as JavaScript Code
participant CONSOLE as Console API
participant EXT as Extension
participant BC as Browser Connector
participant CACHE as Memory Cache
participant MCP as MCP Server
participant AI as AI Assistant
JS->>CONSOLE: console.log("message")
CONSOLE->>EXT: Console Event
EXT->>EXT: Capture & Format
EXT->>BC: WebSocket Message
BC->>BC: Validate & Enrich
BC->>CACHE: Store Log Entry
Note over AI,MCP: AI requests logs
AI->>MCP: get_console_logs
MCP->>BC: GET /console-logs
BC->>CACHE: Query Recent Logs
CACHE-->>BC: Log Entries
BC-->>MCP: Formatted Logs
MCP-->>AI: Console Analysis
Data Transformation:
// Raw console event
{
level: "error",
message: "TypeError: Cannot read property 'foo' of undefined",
timestamp: 1704067200000,
stack: "Error\n at example.js:15:20"
}
// Processed log entry
{
id: "log_abc123",
level: "error",
message: "TypeError: Cannot read property 'foo' of undefined",
timestamp: 1704067200000,
url: "https://example.com/page",
source: "example.js:15:20",
stack: ["Error", " at example.js:15:20"],
metadata: {
userAgent: "Chrome/91.0",
sessionId: "session_xyz789"
}
}
Network Request Flow¶
Network requests are captured and analyzed through this flow:
sequenceDiagram
participant PAGE as Web Page
participant BROWSER as Browser
participant EXT as Extension
participant BC as Browser Connector
participant PROC as Data Processor
participant AI as AI Assistant
PAGE->>BROWSER: fetch('/api/data')
BROWSER->>BROWSER: Execute Request
BROWSER->>EXT: Network Event (DevTools)
EXT->>EXT: Capture Request/Response
EXT->>BC: WebSocket: network-request
BC->>PROC: Process Network Data
PROC->>PROC: Extract Timing, Headers, Body
PROC->>BC: Store Processed Request
Note over AI,BC: AI analyzes network issues
AI->>BC: GET /network-requests?status=error
BC-->>AI: Failed Requests Analysis
Network Data Structure:
// Network request entry
{
id: "req_def456",
url: "https://api.example.com/data",
method: "GET",
status: 500,
statusText: "Internal Server Error",
timestamp: 1704067200000,
duration: 2500, // milliseconds
size: {
request: 1024,
response: 256
},
timing: {
dns: 50,
connect: 100,
send: 10,
wait: 2000,
receive: 340
},
headers: {
request: { "Content-Type": "application/json" },
response: { "Content-Type": "application/json" }
},
body: {
request: "{\"query\":\"test\"}",
response: "{\"error\":\"Database connection failed\"}"
}
}
Screenshot Capture Flow¶
Screenshot requests follow this synchronous pattern:
sequenceDiagram
participant AI as AI Assistant
participant MCP as MCP Server
participant BC as Browser Connector
participant EXT as Extension
participant PAGE as Web Page
AI->>MCP: screenshot_capture
MCP->>BC: POST /capture-screenshot
BC->>EXT: WebSocket: screenshot-request
EXT->>PAGE: Capture DOM
PAGE-->>EXT: Canvas Data
EXT->>EXT: Convert to Base64
EXT-->>BC: Screenshot Data
BC->>BC: Optimize & Validate
BC-->>MCP: Base64 Image
MCP-->>AI: Image + Metadata
Asynchronous Processing Patterns¶
Event Buffering¶
High-frequency events are buffered to prevent overwhelming the system:
graph LR
EVENTS[Browser Events] --> BUFFER[Event Buffer]
BUFFER --> BATCH[Batch Processor]
BATCH --> FILTER[Filter & Dedupe]
FILTER --> STORE[Storage]
subgraph "Buffer Management"
SIZE[Size Limit: 1000]
TIME[Time Window: 1s]
PRIORITY[Priority Queue]
end
BUFFER --> SIZE
BUFFER --> TIME
BUFFER --> PRIORITY
Buffer Implementation:
class EventBuffer {
constructor(options = {}) {
this.maxSize = options.maxSize || 1000;
this.flushInterval = options.flushInterval || 1000;
this.buffer = [];
this.flushTimer = null;
}
add(event) {
this.buffer.push({
...event,
bufferedAt: Date.now()
});
// Flush if buffer is full
if (this.buffer.length >= this.maxSize) {
this.flush();
} else if (!this.flushTimer) {
// Schedule flush
this.flushTimer = setTimeout(() => {
this.flush();
}, this.flushInterval);
}
}
flush() {
if (this.buffer.length === 0) return;
const events = this.buffer.splice(0, this.buffer.length);
this.processEvents(events);
if (this.flushTimer) {
clearTimeout(this.flushTimer);
this.flushTimer = null;
}
}
}
Data Aggregation¶
Multiple data sources are aggregated for comprehensive analysis:
graph TB
subgraph "Data Sources"
CONSOLE[Console Logs]
NETWORK[Network Requests]
ERRORS[JavaScript Errors]
PERF[Performance Metrics]
end
subgraph "Aggregation Layer"
TIMELINE[Timeline Aggregator]
CORRELATE[Event Correlator]
PATTERN[Pattern Detector]
end
subgraph "Analysis Output"
SUMMARY[Session Summary]
INSIGHTS[Performance Insights]
ISSUES[Issue Detection]
end
CONSOLE --> TIMELINE
NETWORK --> TIMELINE
ERRORS --> TIMELINE
PERF --> TIMELINE
TIMELINE --> CORRELATE
CORRELATE --> PATTERN
PATTERN --> SUMMARY
PATTERN --> INSIGHTS
PATTERN --> ISSUES
Lighthouse Audit Flow¶
Lighthouse audits follow a specialized processing pipeline:
sequenceDiagram
participant AI as AI Assistant
participant MCP as MCP Server
participant BC as Browser Connector
participant LH as Lighthouse Runner
participant CHROME as Chrome Instance
participant TARGET as Target Page
AI->>MCP: run_lighthouse_audit
MCP->>BC: POST /lighthouse-audit
BC->>LH: Launch Audit
LH->>CHROME: Start Chrome Instance
CHROME->>TARGET: Navigate to URL
TARGET-->>CHROME: Page Content
CHROME->>CHROME: Run Audits
CHROME-->>LH: Audit Results
LH->>LH: Process & Format
LH-->>BC: Structured Results
BC-->>MCP: Audit Report
MCP-->>AI: Performance Analysis
Lighthouse Data Processing:
// Raw Lighthouse result processing
function processLighthouseResults(lhr) {
return {
url: lhr.finalUrl,
timestamp: new Date(lhr.fetchTime).getTime(),
// Core scores (0-100)
scores: {
performance: Math.round((lhr.categories.performance?.score || 0) * 100),
accessibility: Math.round((lhr.categories.accessibility?.score || 0) * 100),
bestPractices: Math.round((lhr.categories['best-practices']?.score || 0) * 100),
seo: Math.round((lhr.categories.seo?.score || 0) * 100)
},
// Core Web Vitals
metrics: {
firstContentfulPaint: lhr.audits['first-contentful-paint']?.numericValue,
largestContentfulPaint: lhr.audits['largest-contentful-paint']?.numericValue,
cumulativeLayoutShift: lhr.audits['cumulative-layout-shift']?.numericValue,
totalBlockingTime: lhr.audits['total-blocking-time']?.numericValue
},
// Opportunities for improvement
opportunities: extractOpportunities(lhr.audits),
// Diagnostics and issues
diagnostics: extractDiagnostics(lhr.audits)
};
}
Error Handling and Recovery¶
Connection Recovery¶
WebSocket connections implement automatic recovery:
stateDiagram-v2
[*] --> Disconnected
Disconnected --> Connecting: initialize()
Connecting --> Connected: onopen
Connecting --> Failed: onerror
Connected --> Disconnected: onclose
Failed --> Waiting: exponential backoff
Waiting --> Connecting: retry timer
Connected --> Reconnecting: connection lost
Reconnecting --> Connected: reconnect success
Reconnecting --> Failed: reconnect fail
Connected: Queue data\nSend messages
Disconnected: Buffer events\nShow offline status
Failed: Log errors\nSchedule retry
Data Persistence¶
Critical data is persisted during connection issues:
class DataPersistence {
constructor() {
this.offlineQueue = [];
this.maxOfflineSize = 10000; // entries
this.persistenceKey = 'rapidtriage_offline_data';
}
// Buffer data when offline
bufferOfflineData(data) {
this.offlineQueue.push({
...data,
queuedAt: Date.now()
});
// Maintain size limits
if (this.offlineQueue.length > this.maxOfflineSize) {
this.offlineQueue = this.offlineQueue.slice(-this.maxOfflineSize);
}
// Persist to local storage
this.persistToStorage();
}
// Sync when connection restored
async syncOfflineData() {
const queuedData = this.offlineQueue.splice(0);
for (const data of queuedData) {
try {
await this.sendData(data);
} catch (error) {
// Re-queue failed items
this.offlineQueue.unshift(data);
break;
}
}
this.persistToStorage();
}
}
Performance Optimization¶
Data Compression¶
Large payloads are compressed before transmission:
graph LR
RAW[Raw Data] --> COMPRESS[Compression]
COMPRESS --> TRANSMIT[WebSocket]
TRANSMIT --> DECOMPRESS[Decompression]
DECOMPRESS --> PROCESS[Processing]
subgraph "Compression Options"
GZIP[GZIP]
BROTLI[Brotli]
CUSTOM[Custom Delta]
end
COMPRESS --> GZIP
COMPRESS --> BROTLI
COMPRESS --> CUSTOM
Streaming Responses¶
Large datasets are streamed to prevent memory issues:
// Streaming log retrieval
async function* streamConsoleLogs(options = {}) {
const batchSize = options.batchSize || 100;
let offset = 0;
while (true) {
const batch = await this.getLogsBatch(offset, batchSize);
if (batch.length === 0) break;
// Yield batch to consumer
yield {
logs: batch,
offset,
hasMore: batch.length === batchSize
};
offset += batch.length;
}
}
// Usage in API endpoint
app.get('/console-logs/stream', async (req, res) => {
res.writeHead(200, {
'Content-Type': 'application/json',
'Transfer-Encoding': 'chunked'
});
for await (const batch of streamConsoleLogs()) {
res.write(JSON.stringify(batch) + '\n');
}
res.end();
});
Data Security and Privacy¶
Sensitive Data Filtering¶
Personal and sensitive information is filtered at multiple stages:
graph TB
INPUT[Raw Data] --> DETECT[Pattern Detection]
DETECT --> CLASSIFY[Data Classification]
CLASSIFY --> MASK[Masking/Redaction]
MASK --> VALIDATE[Validation]
VALIDATE --> OUTPUT[Clean Data]
subgraph "Sensitive Patterns"
EMAIL[Email Addresses]
PHONE[Phone Numbers]
CC[Credit Cards]
SSN[Social Security]
TOKEN[API Tokens]
end
DETECT --> EMAIL
DETECT --> PHONE
DETECT --> CC
DETECT --> SSN
DETECT --> TOKEN
Data Sanitization Pipeline:
class DataSanitizer {
constructor() {
this.patterns = {
email: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
creditCard: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g,
ssn: /\b\d{3}-?\d{2}-?\d{4}\b/g,
apiKey: /\b[A-Za-z0-9]{32,}\b/g
};
}
sanitize(data) {
let sanitized = JSON.stringify(data);
for (const [type, pattern] of Object.entries(this.patterns)) {
sanitized = sanitized.replace(pattern, `[${type.toUpperCase()}_REDACTED]`);
}
return JSON.parse(sanitized);
}
}
Monitoring and Observability¶
Flow Metrics¶
Key metrics are tracked throughout the data flow:
graph TB
subgraph "Input Metrics"
EVENTS[Events/sec]
BYTES[Bytes/sec]
ERRORS[Error Rate]
end
subgraph "Processing Metrics"
LATENCY[Processing Latency]
QUEUE[Queue Depth]
MEMORY[Memory Usage]
end
subgraph "Output Metrics"
API_CALLS[API Calls/min]
SUCCESS[Success Rate]
CACHE_HIT[Cache Hit Rate]
end
EVENTS --> LATENCY
BYTES --> QUEUE
ERRORS --> MEMORY
LATENCY --> API_CALLS
QUEUE --> SUCCESS
MEMORY --> CACHE_HIT
Distributed Tracing¶
Requests are traced across all components:
// Trace context propagation
class TraceContext {
constructor(traceId = generateTraceId()) {
this.traceId = traceId;
this.spanId = generateSpanId();
this.startTime = Date.now();
this.events = [];
}
addEvent(name, data = {}) {
this.events.push({
name,
timestamp: Date.now(),
duration: Date.now() - this.startTime,
data
});
}
createChildSpan(name) {
return new TraceContext(this.traceId)
.setParentSpan(this.spanId)
.setName(name);
}
}
// Usage in data flow
const trace = new TraceContext();
// Extension
trace.addEvent('console_log_captured', { level: 'error' });
// Browser Connector
trace.addEvent('websocket_received');
trace.addEvent('data_processed');
// MCP Server
trace.addEvent('mcp_tool_executed', { tool: 'get_console_logs' });
This comprehensive data flow architecture ensures reliable, performant, and secure processing of browser debugging data from capture to AI analysis.
Next Steps¶
- Component Details - Deep dive into component internals
- API Reference - Detailed API documentation
- Performance Guide - Optimization strategies