Problem: Polling Kills Real-Time AI UIs
You're building a dashboard that streams live AI inference results — token counts, latency metrics, model outputs — and HTTP polling every second isn't cutting it. The UI lags, the server gets hammered, and users see stale data.
WebSockets fix all of this. But wiring FastAPI WebSockets to a React frontend with proper reconnect logic, typed message contracts, and production error handling is non-trivial.
You'll learn:
- Set up a FastAPI WebSocket server that streams AI inference metrics
- Build a React hook that manages the WebSocket lifecycle (connect, reconnect, close)
- Render a live dashboard with charts that update as data arrives
- Handle connection drops, backpressure, and auth tokens
Time: 35 min | Difficulty: Intermediate
Why WebSockets Over SSE or Polling
Three approaches exist for real-time data in the browser:
| Approach | Latency | Bidirectional | Complexity |
|---|---|---|---|
| HTTP Polling | ~1–5s | ❌ | Low |
| Server-Sent Events (SSE) | ~100ms | ❌ | Low |
| WebSockets | ~10ms | ✅ | Medium |
For an AI dashboard that needs to send control commands (pause inference, switch model) while receiving streamed metrics, WebSockets are the right choice. SSE is fine for one-way streaming; WebSockets are better when you need a duplex channel.
Architecture
React Dashboard
│
├── useWebSocket hook ──connects──▶ FastAPI /ws/{client_id}
│ │ │
│ reconnect logic background task (async generator)
│ │ │
└── Dashboard UI ◀──typed JSON events── AI inference loop
The FastAPI side runs an asyncio background task that simulates (or calls real) AI inference and pushes structured JSON events over the socket. The React side consumes these events through a typed hook and feeds them into Recharts.
Solution
Step 1: Set Up the FastAPI Backend
Start with a clean project using uv (faster than pip for dependency resolution).
# Create project
uv init ai-dashboard-api && cd ai-dashboard-api
# Add dependencies
uv add fastapi uvicorn[standard] websockets pydantic
Create main.py:
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.middleware.cors import CORSMiddleware
import asyncio
import json
import random
import time
from typing import AsyncGenerator
app = FastAPI()
# Allow React dev server on port 5173
app.add_middleware(
CORSMiddleware,
allow_origins=["http://localhost:5173"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
async def inference_stream(model: str = "llama3") -> AsyncGenerator[dict, None]:
"""
Simulates a live AI inference loop.
Replace with actual model calls (Ollama API, OpenAI stream, etc.)
"""
token_count = 0
while True:
# Simulate token-by-token generation latency
await asyncio.sleep(0.2)
token_count += random.randint(1, 5)
latency_ms = round(random.gauss(120, 20), 1) # realistic jitter
yield {
"event": "inference_tick",
"model": model,
"tokens_generated": token_count,
"latency_ms": max(50, latency_ms), # floor at 50ms
"tokens_per_second": round(token_count / (time.time() % 60 + 1), 2),
"timestamp": time.time(),
}
@app.websocket("/ws/{client_id}")
async def websocket_endpoint(websocket: WebSocket, client_id: str):
await websocket.accept()
print(f"Client connected: {client_id}")
model = "llama3"
try:
async for event in inference_stream(model):
# Check for incoming messages (model switch, pause) without blocking
try:
msg = await asyncio.wait_for(websocket.receive_text(), timeout=0.01)
command = json.loads(msg)
if command.get("action") == "switch_model":
model = command.get("model", model)
except asyncio.TimeoutError:
pass # No message — continue streaming
await websocket.send_text(json.dumps(event))
except WebSocketDisconnect:
print(f"Client disconnected: {client_id}")
Run it:
uvicorn main:app --reload --port 8000
Expected output:
INFO: Uvicorn running on http://127.0.0.1:8000
INFO: Started reloader process
If it fails:
ModuleNotFoundError: fastapi→ Runuv syncto install locked depsAddress already in use→ Kill the existing process:lsof -ti:8000 | xargs kill
Step 2: Scaffold the React Frontend
npm create vite@latest ai-dashboard -- --template react-ts
cd ai-dashboard
npm install recharts lucide-react
The TypeScript template gives you type safety for WebSocket message contracts — worth the small setup cost.
Step 3: Define the Message Contract
Create src/types/ws.ts. Defining types upfront prevents silent runtime failures when the backend schema changes.
// Matches the dict shape yielded by inference_stream() in FastAPI
export interface InferenceTick {
event: "inference_tick";
model: string;
tokens_generated: number;
latency_ms: number;
tokens_per_second: number;
timestamp: number;
}
export type DashboardEvent = InferenceTick;
// Commands the frontend can send to the backend
export interface SwitchModelCommand {
action: "switch_model";
model: string;
}
Step 4: Build the useWebSocket Hook
Create src/hooks/useWebSocket.ts. This hook owns the entire connection lifecycle: initial connect, exponential-backoff reconnect, and clean teardown.
import { useEffect, useRef, useState, useCallback } from "react";
import { DashboardEvent } from "../types/ws";
interface UseWebSocketOptions {
clientId: string;
onMessage: (event: DashboardEvent) => void;
}
interface WebSocketState {
status: "connecting" | "open" | "closed" | "error";
send: (data: object) => void;
}
export function useWebSocket({ clientId, onMessage }: UseWebSocketOptions): WebSocketState {
const wsRef = useRef<WebSocket | null>(null);
const retryRef = useRef<ReturnType<typeof setTimeout> | null>(null);
const retryCount = useRef(0);
const [status, setStatus] = useState<WebSocketState["status"]>("connecting");
const connect = useCallback(() => {
// Clean up any existing socket before reconnecting
wsRef.current?.close();
const ws = new WebSocket(`ws://localhost:8000/ws/${clientId}`);
wsRef.current = ws;
setStatus("connecting");
ws.onopen = () => {
setStatus("open");
retryCount.current = 0; // Reset backoff on successful connect
};
ws.onmessage = (e) => {
try {
const data = JSON.parse(e.data) as DashboardEvent;
onMessage(data);
} catch {
console.warn("Unparseable WS message:", e.data);
}
};
ws.onerror = () => setStatus("error");
ws.onclose = () => {
setStatus("closed");
// Exponential backoff: 1s, 2s, 4s, 8s, max 30s
const delay = Math.min(1000 * 2 ** retryCount.current, 30_000);
retryCount.current += 1;
console.log(`WS closed. Reconnecting in ${delay}ms (attempt ${retryCount.current})`);
retryRef.current = setTimeout(connect, delay);
};
}, [clientId, onMessage]);
useEffect(() => {
connect();
return () => {
// Prevent reconnect loop on component unmount
if (retryRef.current) clearTimeout(retryRef.current);
wsRef.current?.close();
};
}, [connect]);
const send = useCallback((data: object) => {
if (wsRef.current?.readyState === WebSocket.OPEN) {
wsRef.current.send(JSON.stringify(data));
}
}, []);
return { status, send };
}
Key decisions here:
retryRefholds the timeout ID so it can be cancelled on unmount — prevents the "can't update state on unmounted component" warningonMessageis passed as a dep touseCallback, so the parent controls whether re-renders retrigger connect- Exponential backoff caps at 30s to avoid flooding a recovering server
Step 5: Build the Dashboard UI
Create src/components/Dashboard.tsx:
import { useState, useCallback, useRef } from "react";
import {
LineChart, Line, XAxis, YAxis, CartesianGrid,
Tooltip, ResponsiveContainer
} from "recharts";
import { Wifi, WifiOff, Zap } from "lucide-react";
import { useWebSocket } from "../hooks/useWebSocket";
import { DashboardEvent, InferenceTick } from "../types/ws";
const MAX_DATA_POINTS = 60; // Keep the last 60 ticks (~12 seconds at 0.2s interval)
export default function Dashboard() {
const [ticks, setTicks] = useState<InferenceTick[]>([]);
const [currentModel, setCurrentModel] = useState("llama3");
const clientId = useRef(`client-${Math.random().toString(36).slice(2)}`);
const handleMessage = useCallback((event: DashboardEvent) => {
if (event.event === "inference_tick") {
setTicks((prev) => {
const next = [...prev, event];
// Trim buffer — avoids unbounded memory growth in long sessions
return next.length > MAX_DATA_POINTS ? next.slice(-MAX_DATA_POINTS) : next;
});
}
}, []);
const { status, send } = useWebSocket({
clientId: clientId.current,
onMessage: handleMessage,
});
const switchModel = (model: string) => {
setCurrentModel(model);
send({ action: "switch_model", model });
};
const latest = ticks[ticks.length - 1];
const chartData = ticks.map((t) => ({
time: new Date(t.timestamp * 1000).toLocaleTimeString(),
latency: t.latency_ms,
tps: t.tokens_per_second,
}));
return (
<div style={{ padding: "24px", fontFamily: "monospace", background: "#0f0f0f", minHeight: "100vh", color: "#e2e8f0" }}>
{/* Header */}
<div style={{ display: "flex", alignItems: "center", gap: "12px", marginBottom: "24px" }}>
{status === "open"
? <Wifi size={20} color="#22c55e" />
: <WifiOff size={20} color="#ef4444" />
}
<h1 style={{ margin: 0, fontSize: "18px" }}>AI Inference Dashboard</h1>
<span style={{
marginLeft: "auto",
fontSize: "12px",
color: status === "open" ? "#22c55e" : "#ef4444"
}}>
{status.toUpperCase()}
</span>
</div>
{/* KPI Cards */}
<div style={{ display: "grid", gridTemplateColumns: "repeat(3, 1fr)", gap: "16px", marginBottom: "32px" }}>
<KpiCard label="Model" value={latest?.model ?? "—"} />
<KpiCard label="Tokens/sec" value={latest?.tokens_per_second?.toFixed(1) ?? "—"} icon={<Zap size={14} />} />
<KpiCard label="Latency (ms)" value={latest?.latency_ms?.toFixed(1) ?? "—"} />
</div>
{/* Latency Chart */}
<ChartPanel title="Inference Latency (ms)">
<ResponsiveContainer width="100%" height={200}>
<LineChart data={chartData}>
<CartesianGrid strokeDasharray="3 3" stroke="#1e293b" />
<XAxis dataKey="time" tick={{ fontSize: 10 }} stroke="#475569" />
<YAxis stroke="#475569" tick={{ fontSize: 10 }} />
<Tooltip contentStyle={{ background: "#1e293b", border: "none" }} />
<Line type="monotone" dataKey="latency" stroke="#818cf8" dot={false} strokeWidth={2} />
</LineChart>
</ResponsiveContainer>
</ChartPanel>
{/* Tokens/sec Chart */}
<ChartPanel title="Tokens per Second">
<ResponsiveContainer width="100%" height={200}>
<LineChart data={chartData}>
<CartesianGrid strokeDasharray="3 3" stroke="#1e293b" />
<XAxis dataKey="time" tick={{ fontSize: 10 }} stroke="#475569" />
<YAxis stroke="#475569" tick={{ fontSize: 10 }} />
<Tooltip contentStyle={{ background: "#1e293b", border: "none" }} />
<Line type="monotone" dataKey="tps" stroke="#34d399" dot={false} strokeWidth={2} />
</LineChart>
</ResponsiveContainer>
</ChartPanel>
{/* Model Switcher */}
<div style={{ marginTop: "24px", display: "flex", gap: "8px" }}>
{["llama3", "mistral", "qwen2.5"].map((m) => (
<button
key={m}
onClick={() => switchModel(m)}
style={{
padding: "6px 14px",
background: currentModel === m ? "#6366f1" : "#1e293b",
color: "#e2e8f0",
border: "none",
borderRadius: "6px",
cursor: "pointer",
fontFamily: "monospace",
}}
>
{m}
</button>
))}
</div>
</div>
);
}
function KpiCard({ label, value, icon }: { label: string; value: string; icon?: React.ReactNode }) {
return (
<div style={{ background: "#1e293b", padding: "16px", borderRadius: "8px" }}>
<div style={{ fontSize: "11px", color: "#94a3b8", marginBottom: "6px", display: "flex", alignItems: "center", gap: "4px" }}>
{icon}{label}
</div>
<div style={{ fontSize: "22px", fontWeight: "bold" }}>{value}</div>
</div>
);
}
function ChartPanel({ title, children }: { title: string; children: React.ReactNode }) {
return (
<div style={{ background: "#1e293b", borderRadius: "8px", padding: "16px", marginBottom: "16px" }}>
<div style={{ fontSize: "12px", color: "#94a3b8", marginBottom: "12px" }}>{title}</div>
{children}
</div>
);
}
Update src/App.tsx to render it:
import Dashboard from "./components/Dashboard";
export default function App() {
return <Dashboard />;
}
Step 6: Run Both Servers
Terminal 1 — FastAPI backend:
cd ai-dashboard-api
uvicorn main:app --reload --port 8000
Terminal 2 — React dev server:
cd ai-dashboard
npm run dev
Open http://localhost:5173. You should see the dashboard animate within 1–2 seconds as ticks arrive.
If the WebSocket shows "ERROR" status:
- CORS block → Confirm
allow_originsinmain.pymatches your Vite port (default5173) - Connection refused → Make sure FastAPI is running on port 8000
- Stuck at "CONNECTING" → Check browser DevTools → Network → WS tab for the handshake error
Verification
With both servers running, open browser DevTools → Network → WS tab and click the /ws/client-xxx connection. You should see a stream of JSON frames arriving every ~200ms:
{"event": "inference_tick", "model": "llama3", "tokens_generated": 47, "latency_ms": 118.3, "tokens_per_second": 3.82, "timestamp": 1741478400.123}
Test the model switcher: click mistral in the UI. Within one tick you should see "model": "mistral" appear in the frames.
Test reconnect: stop the FastAPI server (Ctrl+C). The UI status changes to CLOSED. Restart the server — the hook reconnects automatically within the backoff window.
Production Considerations
Authentication — pass a JWT as a query param or in the first message after connect:
# FastAPI: validate token before accepting
@app.websocket("/ws/{client_id}")
async def websocket_endpoint(websocket: WebSocket, client_id: str, token: str = Query(...)):
payload = verify_jwt(token) # raises HTTPException on invalid
await websocket.accept()
// React: append token to URL
const ws = new WebSocket(`ws://localhost:8000/ws/${clientId}?token=${authToken}`);
Backpressure — if the inference loop is faster than the client can consume, buffer events server-side:
# Use asyncio.Queue as a bounded buffer — drops oldest event when full
queue: asyncio.Queue = asyncio.Queue(maxsize=100)
Deployment — WebSockets need sticky sessions behind a load balancer. With nginx:
upstream api {
ip_hash; # sticky sessions — same client always hits same worker
server 127.0.0.1:8000;
server 127.0.0.1:8001;
}
What You Learned
- FastAPI's
asyncio-native WebSocket handler is non-blocking by design — you can mixreceive_text()andsend_text()withasyncio.sleepwithout threads - The
useWebSockethook's exponential backoff is critical for production — naive immediate reconnect can DDoS your own server after a deploy - Trim the
ticksbuffer (MAX_DATA_POINTS) — a dashboard left open for hours will OOM without it - For truly high-frequency streams (>100 events/sec), batch events server-side before sending to reduce React re-renders
Limitation: This implementation uses a single WebSocket per client. For multi-tenant dashboards with hundreds of concurrent users, consider moving to a pub/sub broker (Redis Streams or Kafka) and having FastAPI act as a WebSocket gateway rather than the data source.
Tested on FastAPI 0.115, Uvicorn 0.32, React 19, Vite 6.2, Node 22 LTS — macOS Sequoia & Ubuntu 24.04