LangGraph Streaming: Real-Time Token Output to Frontend

Stream LangGraph agent tokens to your frontend in real time using Server-Sent Events and async generators. Works with React, Next.js, and FastAPI.

Problem: LangGraph Responses Appear All at Once

Your LangGraph agent finishes processing, then dumps the full response. Users stare at a blank screen for 5–15 seconds. You want token-by-token streaming like ChatGPT — each word appearing as the model generates it.

You'll learn:

  • How to stream tokens from a LangGraph graph using .astream_events()
  • How to expose a streaming endpoint with FastAPI and Server-Sent Events
  • How to consume the stream in a React frontend with fetch and ReadableStream

Time: 20 min | Difficulty: Intermediate


Why LangGraph Streaming Is Non-Trivial

LangGraph wraps LLM calls inside nodes. The graph orchestrates state transitions, so you can't just call .stream() on the model directly — you need to tap into the graph's event bus to get tokens from inside a running node.

LangGraph provides three streaming modes:

ModeWhat you getUse when
astream()Full node output after each node finishesSeeing intermediate state
astream_events()Token-level events from inside nodesReal-time token streaming
astream_log()Full run log with patchesDebugging

For token-by-token output to a UI, astream_events() is the right tool.


Solution

Step 1: Set Up the LangGraph Graph

Start with a minimal single-node graph. The pattern works the same for multi-node agents.

# graph.py
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from typing import TypedDict

class AgentState(TypedDict):
    messages: list
    response: str

llm = ChatOpenAI(model="gpt-4o", streaming=True)  # streaming=True is required

def chat_node(state: AgentState) -> AgentState:
    # LangGraph calls this node; the LLM streams internally
    response = llm.invoke(state["messages"])
    return {"response": response.content}

builder = StateGraph(AgentState)
builder.add_node("chat", chat_node)
builder.set_entry_point("chat")
builder.add_edge("chat", END)

graph = builder.compile()

streaming=True on the LLM is mandatory. Without it, astream_events() will still emit events, but all tokens arrive in a single chunk at the end — defeating the purpose.


Step 2: Stream Events from the Graph

astream_events() emits a stream of typed events. You only care about on_chat_model_stream — that's where individual tokens live.

# stream_runner.py
from graph import graph
from langchain_core.messages import HumanMessage

async def stream_tokens(user_input: str):
    inputs = {"messages": [HumanMessage(content=user_input)]}

    async for event in graph.astream_events(inputs, version="v2"):
        kind = event["event"]

        # Filter to token-level stream events only
        if kind == "on_chat_model_stream":
            chunk = event["data"]["chunk"]
            token = chunk.content  # single token or small piece
            if token:
                yield token

The version="v2" parameter is required for LangGraph 0.2+. Omitting it raises a deprecation warning and falls back to v1 event shapes, which differ.

Event fields that matter:

FieldValueMeaning
eventon_chat_model_streamToken arrived
nameNode name (e.g., "chat")Which node emitted it
data["chunk"].content"Hello"The actual token text

Step 3: Expose a Streaming Endpoint with FastAPI

Server-Sent Events (SSE) are the simplest way to push tokens to a browser. No WebSocket setup needed.

# main.py
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from stream_runner import stream_tokens

app = FastAPI()

class ChatRequest(BaseModel):
    message: str

@app.post("/stream")
async def stream_chat(request: ChatRequest):
    async def event_generator():
        async for token in stream_tokens(request.message):
            # SSE format: each message must start with "data: " and end with "\n\n"
            yield f"data: {token}\n\n"
        # Signal to the client that the stream is done
        yield "data: [DONE]\n\n"

    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no",  # Disables Nginx buffering — critical for SSE
        },
    )

The X-Accel-Buffering: no header is easy to forget. If you're behind Nginx (including most cloud platforms), it will buffer the entire response before forwarding — making streaming invisible to the client.

Run the server:

uvicorn main:app --reload --port 8000

Step 4: Consume the Stream in React

Use the fetch API with a ReadableStream reader. No external library needed.

// components/StreamingChat.tsx
import { useState } from "react"

export default function StreamingChat() {
  const [output, setOutput] = useState("")
  const [loading, setLoading] = useState(false)

  async function sendMessage(userInput: string) {
    setOutput("")
    setLoading(true)

    const response = await fetch("http://localhost:8000/stream", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ message: userInput }),
    })

    const reader = response.body!.getReader()
    const decoder = new TextDecoder()

    while (true) {
      const { done, value } = await reader.read()
      if (done) break

      const chunk = decoder.decode(value, { stream: true })

      // Each SSE chunk may contain multiple lines; parse them
      for (const line of chunk.split("\n")) {
        if (line.startsWith("data: ")) {
          const token = line.slice(6)  // Remove "data: " prefix
          if (token === "[DONE]") {
            setLoading(false)
            return
          }
          // Append each token as it arrives
          setOutput((prev) => prev + token)
        }
      }
    }

    setLoading(false)
  }

  return (
    <div>
      <button onClick={() => sendMessage("Explain RAG in 3 sentences")} disabled={loading}>
        {loading ? "Streaming…" : "Ask"}
      </button>
      <pre style={{ whiteSpace: "pre-wrap" }}>{output}</pre>
    </div>
  )
}

The { stream: true } option in TextDecoder.decode() handles multi-byte characters that get split across chunks — important for non-ASCII output.


Step 5: Handle Multi-Node Graphs

If your graph has multiple LLM nodes (e.g., a planner and an executor), filter by node name so you only stream the final response node.

# stream_runner.py — multi-node version
async def stream_tokens(user_input: str, stream_node: str = "executor"):
    inputs = {"messages": [HumanMessage(content=user_input)]}

    async for event in graph.astream_events(inputs, version="v2"):
        if (
            event["event"] == "on_chat_model_stream"
            and event["name"] == stream_node  # Only stream from the target node
        ):
            token = event["data"]["chunk"].content
            if token:
                yield token

This prevents the frontend from receiving planning tokens mid-stream, which often look like JSON scaffolding and confuse users.


Verification

Start the FastAPI server, then test with curl before wiring up React:

curl -X POST http://localhost:8000/stream \
  -H "Content-Type: application/json" \
  -d '{"message": "Count to 5 slowly"}' \
  --no-buffer

You should see: Tokens printing to the terminal one by one, not all at once. The stream ends with data: [DONE].

If all tokens appear simultaneously, check:

  1. streaming=True is set on your ChatOpenAI instance
  2. X-Accel-Buffering: no is in the response headers
  3. You're not wrapping the LLM call in a synchronous function that blocks the event loop

What You Learned

  • astream_events() with version="v2" is the correct way to extract tokens from inside LangGraph nodes
  • streaming=True on the LLM is required — the graph won't stream without it
  • X-Accel-Buffering: no is the most common production gotcha for SSE behind a reverse proxy
  • Filter by event["name"] in multi-node graphs to control which node's tokens reach the frontend

Limitation: SSE is unidirectional. If you need the client to send messages mid-stream (e.g., to cancel or steer the agent), switch to WebSockets using fastapi-websocket-rpc or a similar library.

Tested on LangGraph 0.2.x, LangChain 0.3.x, FastAPI 0.115, Python 3.12, React 19