Problem: Polling Kills Real-Time AI UIs

You're building a dashboard that streams live AI inference results — token counts, latency metrics, model outputs — and HTTP polling every second isn't cutting it. The UI lags, the server gets hammered, and users see stale data.

WebSockets fix all of this. But wiring FastAPI WebSockets to a React frontend with proper reconnect logic, typed message contracts, and production error handling is non-trivial.

You'll learn:

Set up a FastAPI WebSocket server that streams AI inference metrics
Build a React hook that manages the WebSocket lifecycle (connect, reconnect, close)
Render a live dashboard with charts that update as data arrives
Handle connection drops, backpressure, and auth tokens

Time: 35 min | Difficulty: Intermediate

Why WebSockets Over SSE or Polling

Three approaches exist for real-time data in the browser:

Approach	Latency	Bidirectional	Complexity
HTTP Polling	~1–5s	❌	Low
Server-Sent Events (SSE)	~100ms	❌	Low
WebSockets	~10ms	✅	Medium

For an AI dashboard that needs to send control commands (pause inference, switch model) while receiving streamed metrics, WebSockets are the right choice. SSE is fine for one-way streaming; WebSockets are better when you need a duplex channel.

Architecture

React Dashboard
  │
  ├── useWebSocket hook  ──connects──▶  FastAPI /ws/{client_id}
  │       │                                     │
  │   reconnect logic               background task (async generator)
  │       │                                     │
  └── Dashboard UI  ◀──typed JSON events──  AI inference loop

The FastAPI side runs an asyncio background task that simulates (or calls real) AI inference and pushes structured JSON events over the socket. The React side consumes these events through a typed hook and feeds them into Recharts.

Solution

Step 1: Set Up the FastAPI Backend

Start with a clean project using uv (faster than pip for dependency resolution).

# Create project
uv init ai-dashboard-api && cd ai-dashboard-api

# Add dependencies
uv add fastapi uvicorn[standard] websockets pydantic

Create main.py:

from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.middleware.cors import CORSMiddleware
import asyncio
import json
import random
import time
from typing import AsyncGenerator

app = FastAPI()

# Allow React dev server on port 5173
app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:5173"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)


async def inference_stream(model: str = "llama3") -> AsyncGenerator[dict, None]:
    """
    Simulates a live AI inference loop.
    Replace with actual model calls (Ollama API, OpenAI stream, etc.)
    """
    token_count = 0
    while True:
        # Simulate token-by-token generation latency
        await asyncio.sleep(0.2)

        token_count += random.randint(1, 5)
        latency_ms = round(random.gauss(120, 20), 1)  # realistic jitter

        yield {
            "event": "inference_tick",
            "model": model,
            "tokens_generated": token_count,
            "latency_ms": max(50, latency_ms),  # floor at 50ms
            "tokens_per_second": round(token_count / (time.time() % 60 + 1), 2),
            "timestamp": time.time(),
        }


@app.websocket("/ws/{client_id}")
async def websocket_endpoint(websocket: WebSocket, client_id: str):
    await websocket.accept()
    print(f"Client connected: {client_id}")

    model = "llama3"

    try:
        async for event in inference_stream(model):
            # Check for incoming messages (model switch, pause) without blocking
            try:
                msg = await asyncio.wait_for(websocket.receive_text(), timeout=0.01)
                command = json.loads(msg)
                if command.get("action") == "switch_model":
                    model = command.get("model", model)
            except asyncio.TimeoutError:
                pass  # No message — continue streaming

            await websocket.send_text(json.dumps(event))

    except WebSocketDisconnect:
        print(f"Client disconnected: {client_id}")

Run it:

uvicorn main:app --reload --port 8000

Expected output:

INFO:     Uvicorn running on http://127.0.0.1:8000
INFO:     Started reloader process

If it fails:

ModuleNotFoundError: fastapi → Run uv sync to install locked deps
Address already in use → Kill the existing process: lsof -ti:8000 | xargs kill

Step 2: Scaffold the React Frontend

npm create vite@latest ai-dashboard -- --template react-ts
cd ai-dashboard
npm install recharts lucide-react

The TypeScript template gives you type safety for WebSocket message contracts — worth the small setup cost.

Step 3: Define the Message Contract

Create src/types/ws.ts. Defining types upfront prevents silent runtime failures when the backend schema changes.

// Matches the dict shape yielded by inference_stream() in FastAPI
export interface InferenceTick {
  event: "inference_tick";
  model: string;
  tokens_generated: number;
  latency_ms: number;
  tokens_per_second: number;
  timestamp: number;
}

export type DashboardEvent = InferenceTick;

// Commands the frontend can send to the backend
export interface SwitchModelCommand {
  action: "switch_model";
  model: string;
}

Step 4: Build the `useWebSocket` Hook

Create src/hooks/useWebSocket.ts. This hook owns the entire connection lifecycle: initial connect, exponential-backoff reconnect, and clean teardown.

import { useEffect, useRef, useState, useCallback } from "react";
import { DashboardEvent } from "../types/ws";

interface UseWebSocketOptions {
  clientId: string;
  onMessage: (event: DashboardEvent) => void;
}

interface WebSocketState {
  status: "connecting" | "open" | "closed" | "error";
  send: (data: object) => void;
}

export function useWebSocket({ clientId, onMessage }: UseWebSocketOptions): WebSocketState {
  const wsRef = useRef<WebSocket | null>(null);
  const retryRef = useRef<ReturnType<typeof setTimeout> | null>(null);
  const retryCount = useRef(0);
  const [status, setStatus] = useState<WebSocketState["status"]>("connecting");

  const connect = useCallback(() => {
    // Clean up any existing socket before reconnecting
    wsRef.current?.close();

    const ws = new WebSocket(`ws://localhost:8000/ws/${clientId}`);
    wsRef.current = ws;
    setStatus("connecting");

    ws.onopen = () => {
      setStatus("open");
      retryCount.current = 0; // Reset backoff on successful connect
    };

    ws.onmessage = (e) => {
      try {
        const data = JSON.parse(e.data) as DashboardEvent;
        onMessage(data);
      } catch {
        console.warn("Unparseable WS message:", e.data);
      }
    };

    ws.onerror = () => setStatus("error");

    ws.onclose = () => {
      setStatus("closed");
      // Exponential backoff: 1s, 2s, 4s, 8s, max 30s
      const delay = Math.min(1000 * 2 ** retryCount.current, 30_000);
      retryCount.current += 1;
      console.log(`WS closed. Reconnecting in ${delay}ms (attempt ${retryCount.current})`);
      retryRef.current = setTimeout(connect, delay);
    };
  }, [clientId, onMessage]);

  useEffect(() => {
    connect();
    return () => {
      // Prevent reconnect loop on component unmount
      if (retryRef.current) clearTimeout(retryRef.current);
      wsRef.current?.close();
    };
  }, [connect]);

  const send = useCallback((data: object) => {
    if (wsRef.current?.readyState === WebSocket.OPEN) {
      wsRef.current.send(JSON.stringify(data));
    }
  }, []);

  return { status, send };
}

Key decisions here:

retryRef holds the timeout ID so it can be cancelled on unmount — prevents the "can't update state on unmounted component" warning
onMessage is passed as a dep to useCallback, so the parent controls whether re-renders retrigger connect
Exponential backoff caps at 30s to avoid flooding a recovering server

Step 5: Build the Dashboard UI

Create src/components/Dashboard.tsx:

import { useState, useCallback, useRef } from "react";
import {
  LineChart, Line, XAxis, YAxis, CartesianGrid,
  Tooltip, ResponsiveContainer
} from "recharts";
import { Wifi, WifiOff, Zap } from "lucide-react";
import { useWebSocket } from "../hooks/useWebSocket";
import { DashboardEvent, InferenceTick } from "../types/ws";

const MAX_DATA_POINTS = 60; // Keep the last 60 ticks (~12 seconds at 0.2s interval)

export default function Dashboard() {
  const [ticks, setTicks] = useState<InferenceTick[]>([]);
  const [currentModel, setCurrentModel] = useState("llama3");
  const clientId = useRef(`client-${Math.random().toString(36).slice(2)}`);

  const handleMessage = useCallback((event: DashboardEvent) => {
    if (event.event === "inference_tick") {
      setTicks((prev) => {
        const next = [...prev, event];
        // Trim buffer — avoids unbounded memory growth in long sessions
        return next.length > MAX_DATA_POINTS ? next.slice(-MAX_DATA_POINTS) : next;
      });
    }
  }, []);

  const { status, send } = useWebSocket({
    clientId: clientId.current,
    onMessage: handleMessage,
  });

  const switchModel = (model: string) => {
    setCurrentModel(model);
    send({ action: "switch_model", model });
  };

  const latest = ticks[ticks.length - 1];

  const chartData = ticks.map((t) => ({
    time: new Date(t.timestamp * 1000).toLocaleTimeString(),
    latency: t.latency_ms,
    tps: t.tokens_per_second,
  }));

  return (
    <div style={{ padding: "24px", fontFamily: "monospace", background: "#0f0f0f", minHeight: "100vh", color: "#e2e8f0" }}>

      {/* Header */}
      <div style={{ display: "flex", alignItems: "center", gap: "12px", marginBottom: "24px" }}>
        {status === "open"
          ? <Wifi size={20} color="#22c55e" />
          : <WifiOff size={20} color="#ef4444" />
        }
        <h1 style={{ margin: 0, fontSize: "18px" }}>AI Inference Dashboard</h1>
        <span style={{
          marginLeft: "auto",
          fontSize: "12px",
          color: status === "open" ? "#22c55e" : "#ef4444"
        }}>
          {status.toUpperCase()}
        </span>
      </div>

      {/* KPI Cards */}
      <div style={{ display: "grid", gridTemplateColumns: "repeat(3, 1fr)", gap: "16px", marginBottom: "32px" }}>
        <KpiCard label="Model" value={latest?.model ?? "—"} />
        <KpiCard label="Tokens/sec" value={latest?.tokens_per_second?.toFixed(1) ?? "—"} icon={<Zap size={14} />} />
        <KpiCard label="Latency (ms)" value={latest?.latency_ms?.toFixed(1) ?? "—"} />
      </div>

      {/* Latency Chart */}
      <ChartPanel title="Inference Latency (ms)">
        <ResponsiveContainer width="100%" height={200}>
          <LineChart data={chartData}>
            <CartesianGrid strokeDasharray="3 3" stroke="#1e293b" />
            <XAxis dataKey="time" tick={{ fontSize: 10 }} stroke="#475569" />
            <YAxis stroke="#475569" tick={{ fontSize: 10 }} />
            <Tooltip contentStyle={{ background: "#1e293b", border: "none" }} />
            <Line type="monotone" dataKey="latency" stroke="#818cf8" dot={false} strokeWidth={2} />
          </LineChart>
        </ResponsiveContainer>
      </ChartPanel>

      {/* Tokens/sec Chart */}
      <ChartPanel title="Tokens per Second">
        <ResponsiveContainer width="100%" height={200}>
          <LineChart data={chartData}>
            <CartesianGrid strokeDasharray="3 3" stroke="#1e293b" />
            <XAxis dataKey="time" tick={{ fontSize: 10 }} stroke="#475569" />
            <YAxis stroke="#475569" tick={{ fontSize: 10 }} />
            <Tooltip contentStyle={{ background: "#1e293b", border: "none" }} />
            <Line type="monotone" dataKey="tps" stroke="#34d399" dot={false} strokeWidth={2} />
          </LineChart>
        </ResponsiveContainer>
      </ChartPanel>

      {/* Model Switcher */}
      <div style={{ marginTop: "24px", display: "flex", gap: "8px" }}>
        {["llama3", "mistral", "qwen2.5"].map((m) => (
          <button
            key={m}
            onClick={() => switchModel(m)}
            style={{
              padding: "6px 14px",
              background: currentModel === m ? "#6366f1" : "#1e293b",
              color: "#e2e8f0",
              border: "none",
              borderRadius: "6px",
              cursor: "pointer",
              fontFamily: "monospace",
            }}
          >
            {m}
          </button>
        ))}
      </div>
    </div>
  );
}

function KpiCard({ label, value, icon }: { label: string; value: string; icon?: React.ReactNode }) {
  return (
    <div style={{ background: "#1e293b", padding: "16px", borderRadius: "8px" }}>
      <div style={{ fontSize: "11px", color: "#94a3b8", marginBottom: "6px", display: "flex", alignItems: "center", gap: "4px" }}>
        {icon}{label}
      </div>
      <div style={{ fontSize: "22px", fontWeight: "bold" }}>{value}</div>
    </div>
  );
}

function ChartPanel({ title, children }: { title: string; children: React.ReactNode }) {
  return (
    <div style={{ background: "#1e293b", borderRadius: "8px", padding: "16px", marginBottom: "16px" }}>
      <div style={{ fontSize: "12px", color: "#94a3b8", marginBottom: "12px" }}>{title}</div>
      {children}
    </div>
  );
}

Update src/App.tsx to render it:

import Dashboard from "./components/Dashboard";

export default function App() {
  return <Dashboard />;
}

Step 6: Run Both Servers

Terminal 1 — FastAPI backend:

cd ai-dashboard-api
uvicorn main:app --reload --port 8000

Terminal 2 — React dev server:

cd ai-dashboard
npm run dev

Open http://localhost:5173. You should see the dashboard animate within 1–2 seconds as ticks arrive.

If the WebSocket shows "ERROR" status:

CORS block → Confirm allow_origins in main.py matches your Vite port (default 5173)
Connection refused → Make sure FastAPI is running on port 8000
Stuck at "CONNECTING" → Check browser DevTools → Network → WS tab for the handshake error

Verification

With both servers running, open browser DevTools → Network → WS tab and click the /ws/client-xxx connection. You should see a stream of JSON frames arriving every ~200ms:

{"event": "inference_tick", "model": "llama3", "tokens_generated": 47, "latency_ms": 118.3, "tokens_per_second": 3.82, "timestamp": 1741478400.123}

Test the model switcher: click mistral in the UI. Within one tick you should see "model": "mistral" appear in the frames.

Test reconnect: stop the FastAPI server (Ctrl+C). The UI status changes to CLOSED. Restart the server — the hook reconnects automatically within the backoff window.

Production Considerations

Authentication — pass a JWT as a query param or in the first message after connect:

# FastAPI: validate token before accepting
@app.websocket("/ws/{client_id}")
async def websocket_endpoint(websocket: WebSocket, client_id: str, token: str = Query(...)):
    payload = verify_jwt(token)  # raises HTTPException on invalid
    await websocket.accept()

// React: append token to URL
const ws = new WebSocket(`ws://localhost:8000/ws/${clientId}?token=${authToken}`);

Backpressure — if the inference loop is faster than the client can consume, buffer events server-side:

# Use asyncio.Queue as a bounded buffer — drops oldest event when full
queue: asyncio.Queue = asyncio.Queue(maxsize=100)

Deployment — WebSockets need sticky sessions behind a load balancer. With nginx:

upstream api {
    ip_hash;  # sticky sessions — same client always hits same worker
    server 127.0.0.1:8000;
    server 127.0.0.1:8001;
}

What You Learned

FastAPI's asyncio-native WebSocket handler is non-blocking by design — you can mix receive_text() and send_text() with asyncio.sleep without threads
The useWebSocket hook's exponential backoff is critical for production — naive immediate reconnect can DDoS your own server after a deploy
Trim the ticks buffer (MAX_DATA_POINTS) — a dashboard left open for hours will OOM without it
For truly high-frequency streams (>100 events/sec), batch events server-side before sending to reduce React re-renders

Limitation: This implementation uses a single WebSocket per client. For multi-tenant dashboards with hundreds of concurrent users, consider moving to a pub/sub broker (Redis Streams or Kafka) and having FastAPI act as a WebSocket gateway rather than the data source.

Tested on FastAPI 0.115, Uvicorn 0.32, React 19, Vite 6.2, Node 22 LTS — macOS Sequoia & Ubuntu 24.04