Problem: LangGraph Responses Appear All at Once
Your LangGraph agent finishes processing, then dumps the full response. Users stare at a blank screen for 5–15 seconds. You want token-by-token streaming like ChatGPT — each word appearing as the model generates it.
You'll learn:
- How to stream tokens from a LangGraph graph using
.astream_events() - How to expose a streaming endpoint with FastAPI and Server-Sent Events
- How to consume the stream in a React frontend with
fetchandReadableStream
Time: 20 min | Difficulty: Intermediate
Why LangGraph Streaming Is Non-Trivial
LangGraph wraps LLM calls inside nodes. The graph orchestrates state transitions, so you can't just call .stream() on the model directly — you need to tap into the graph's event bus to get tokens from inside a running node.
LangGraph provides three streaming modes:
| Mode | What you get | Use when |
|---|---|---|
astream() | Full node output after each node finishes | Seeing intermediate state |
astream_events() | Token-level events from inside nodes | Real-time token streaming |
astream_log() | Full run log with patches | Debugging |
For token-by-token output to a UI, astream_events() is the right tool.
Solution
Step 1: Set Up the LangGraph Graph
Start with a minimal single-node graph. The pattern works the same for multi-node agents.
# graph.py
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from typing import TypedDict
class AgentState(TypedDict):
messages: list
response: str
llm = ChatOpenAI(model="gpt-4o", streaming=True) # streaming=True is required
def chat_node(state: AgentState) -> AgentState:
# LangGraph calls this node; the LLM streams internally
response = llm.invoke(state["messages"])
return {"response": response.content}
builder = StateGraph(AgentState)
builder.add_node("chat", chat_node)
builder.set_entry_point("chat")
builder.add_edge("chat", END)
graph = builder.compile()
streaming=True on the LLM is mandatory. Without it, astream_events() will still emit events, but all tokens arrive in a single chunk at the end — defeating the purpose.
Step 2: Stream Events from the Graph
astream_events() emits a stream of typed events. You only care about on_chat_model_stream — that's where individual tokens live.
# stream_runner.py
from graph import graph
from langchain_core.messages import HumanMessage
async def stream_tokens(user_input: str):
inputs = {"messages": [HumanMessage(content=user_input)]}
async for event in graph.astream_events(inputs, version="v2"):
kind = event["event"]
# Filter to token-level stream events only
if kind == "on_chat_model_stream":
chunk = event["data"]["chunk"]
token = chunk.content # single token or small piece
if token:
yield token
The version="v2" parameter is required for LangGraph 0.2+. Omitting it raises a deprecation warning and falls back to v1 event shapes, which differ.
Event fields that matter:
| Field | Value | Meaning |
|---|---|---|
event | on_chat_model_stream | Token arrived |
name | Node name (e.g., "chat") | Which node emitted it |
data["chunk"].content | "Hello" | The actual token text |
Step 3: Expose a Streaming Endpoint with FastAPI
Server-Sent Events (SSE) are the simplest way to push tokens to a browser. No WebSocket setup needed.
# main.py
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from stream_runner import stream_tokens
app = FastAPI()
class ChatRequest(BaseModel):
message: str
@app.post("/stream")
async def stream_chat(request: ChatRequest):
async def event_generator():
async for token in stream_tokens(request.message):
# SSE format: each message must start with "data: " and end with "\n\n"
yield f"data: {token}\n\n"
# Signal to the client that the stream is done
yield "data: [DONE]\n\n"
return StreamingResponse(
event_generator(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no", # Disables Nginx buffering — critical for SSE
},
)
The X-Accel-Buffering: no header is easy to forget. If you're behind Nginx (including most cloud platforms), it will buffer the entire response before forwarding — making streaming invisible to the client.
Run the server:
uvicorn main:app --reload --port 8000
Step 4: Consume the Stream in React
Use the fetch API with a ReadableStream reader. No external library needed.
// components/StreamingChat.tsx
import { useState } from "react"
export default function StreamingChat() {
const [output, setOutput] = useState("")
const [loading, setLoading] = useState(false)
async function sendMessage(userInput: string) {
setOutput("")
setLoading(true)
const response = await fetch("http://localhost:8000/stream", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ message: userInput }),
})
const reader = response.body!.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value, { stream: true })
// Each SSE chunk may contain multiple lines; parse them
for (const line of chunk.split("\n")) {
if (line.startsWith("data: ")) {
const token = line.slice(6) // Remove "data: " prefix
if (token === "[DONE]") {
setLoading(false)
return
}
// Append each token as it arrives
setOutput((prev) => prev + token)
}
}
}
setLoading(false)
}
return (
<div>
<button onClick={() => sendMessage("Explain RAG in 3 sentences")} disabled={loading}>
{loading ? "Streaming…" : "Ask"}
</button>
<pre style={{ whiteSpace: "pre-wrap" }}>{output}</pre>
</div>
)
}
The { stream: true } option in TextDecoder.decode() handles multi-byte characters that get split across chunks — important for non-ASCII output.
Step 5: Handle Multi-Node Graphs
If your graph has multiple LLM nodes (e.g., a planner and an executor), filter by node name so you only stream the final response node.
# stream_runner.py — multi-node version
async def stream_tokens(user_input: str, stream_node: str = "executor"):
inputs = {"messages": [HumanMessage(content=user_input)]}
async for event in graph.astream_events(inputs, version="v2"):
if (
event["event"] == "on_chat_model_stream"
and event["name"] == stream_node # Only stream from the target node
):
token = event["data"]["chunk"].content
if token:
yield token
This prevents the frontend from receiving planning tokens mid-stream, which often look like JSON scaffolding and confuse users.
Verification
Start the FastAPI server, then test with curl before wiring up React:
curl -X POST http://localhost:8000/stream \
-H "Content-Type: application/json" \
-d '{"message": "Count to 5 slowly"}' \
--no-buffer
You should see: Tokens printing to the terminal one by one, not all at once. The stream ends with data: [DONE].
If all tokens appear simultaneously, check:
streaming=Trueis set on yourChatOpenAIinstanceX-Accel-Buffering: nois in the response headers- You're not wrapping the LLM call in a synchronous function that blocks the event loop
What You Learned
astream_events()withversion="v2"is the correct way to extract tokens from inside LangGraph nodesstreaming=Trueon the LLM is required — the graph won't stream without itX-Accel-Buffering: nois the most common production gotcha for SSE behind a reverse proxy- Filter by
event["name"]in multi-node graphs to control which node's tokens reach the frontend
Limitation: SSE is unidirectional. If you need the client to send messages mid-stream (e.g., to cancel or steer the agent), switch to WebSockets using fastapi-websocket-rpc or a similar library.
Tested on LangGraph 0.2.x, LangChain 0.3.x, FastAPI 0.115, Python 3.12, React 19