What Is a LangGraph State Machine and Why It Matters in 2026
Most LLM agent tutorials show a linear chain: input → LLM → output. Real agents don't work that way. They route to different tools based on intent, retry on failure, run steps in parallel, and loop until a condition is met.
LangGraph models this as a state machine: a directed graph where nodes are functions and edges are routing decisions. Every transition is explicit. Every branch is testable. That's what makes it production-worthy when CrewAI's implicit orchestration breaks down.
This guide covers conditional edges, parallel fan-out, dynamic routing at runtime, and the patterns that trip up intermediate LangGraph users.
How LangGraph State Machines Work
The mental model: your agent is a graph of functions sharing a typed state object.
User Input
│
▼
[classify_intent] ──"tool_call"──▶ [run_tool] ──▶ [format_response]
│ ▲
└──"direct_answer"──────────────────────────────┘
│
└──"clarify"──▶ [ask_followup] ──▶ (back to classify_intent)
Three concepts control everything:
State — a typed TypedDict (or Pydantic model) passed between every node. Nodes read from it and write back to it. Nothing is passed as function arguments between nodes directly.
Nodes — plain Python functions with signature (state: YourState) -> dict. The returned dict is merged into the state, not a replacement.
Edges — connections between nodes. A conditional edge runs a router function that returns the name of the next node as a string.
Defining State Correctly
State design is where most bugs start. Get this right before writing any nodes.
from typing import Annotated, Literal
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
# add_messages reducer appends instead of overwriting — critical for chat history
messages: Annotated[list, add_messages]
intent: Literal["tool_call", "direct_answer", "clarify"] | None
tool_result: str | None
retry_count: int
error: str | None
Two rules worth burning in:
- Use
Annotatedwith a reducer for lists you append to. Withoutadd_messages, each node that touchesmessageswill overwrite the whole list. - Keep branching signals in state explicitly. Storing
intentas a field means your router function can be a pure, testable function — no LLM call inside the edge itself.
Building the Graph: Nodes and Basic Edges
from langgraph.graph import StateGraph, END
def classify_intent(state: AgentState) -> dict:
# Call your LLM or classifier here
last_message = state["messages"][-1].content
# Simple keyword routing for illustration — replace with LLM call
if "?" in last_message and len(last_message) < 20:
return {"intent": "clarify", "retry_count": 0}
elif any(w in last_message for w in ["search", "calculate", "fetch"]):
return {"intent": "tool_call", "retry_count": 0}
else:
return {"intent": "direct_answer", "retry_count": 0}
def run_tool(state: AgentState) -> dict:
# Execute whichever tool the intent maps to
try:
result = execute_tool(state["messages"][-1].content)
return {"tool_result": result, "error": None}
except Exception as e:
return {"tool_result": None, "error": str(e)}
def format_response(state: AgentState) -> dict:
from langchain_core.messages import AIMessage
content = state["tool_result"] or "Here's what I found based on your question."
return {"messages": [AIMessage(content=content)]}
def ask_followup(state: AgentState) -> dict:
from langchain_core.messages import AIMessage
return {"messages": [AIMessage(content="Could you clarify what you mean?")]}
# Wire the graph
builder = StateGraph(AgentState)
builder.add_node("classify_intent", classify_intent)
builder.add_node("run_tool", run_tool)
builder.add_node("format_response", format_response)
builder.add_node("ask_followup", ask_followup)
builder.set_entry_point("classify_intent")
Conditional Edges: The Core Branching Primitive
A conditional edge runs a router function after a node completes. The router returns the name of the next node.
def route_after_classify(state: AgentState) -> str:
# Pure function — reads state, returns a node name string
intent = state["intent"]
if intent == "tool_call":
return "run_tool"
elif intent == "clarify":
return "ask_followup"
else:
return "format_response"
builder.add_conditional_edges(
source="classify_intent", # which node triggers the routing
path=route_after_classify, # router function
path_map={ # maps return values to node names
"run_tool": "run_tool",
"ask_followup": "ask_followup",
"format_response": "format_response",
}
)
The path_map is optional if your router returns exact node name strings — but including it makes the graph visualizable and validates that all returned values map to real nodes at build time.
Retry Loops: Branching Back to a Previous Node
This is where LangGraph outshines linear chains. If run_tool fails, loop back rather than crashing.
def route_after_tool(state: AgentState) -> str:
if state["error"] and state["retry_count"] < 3:
return "retry" # maps back to classify_intent
elif state["error"]:
return "format_error" # give up after 3 attempts
else:
return "format_response"
def increment_retry(state: AgentState) -> dict:
# Intermediate node to increment counter before looping back
return {"retry_count": state["retry_count"] + 1, "error": None}
builder.add_node("increment_retry", increment_retry)
builder.add_conditional_edges(
source="run_tool",
path=route_after_tool,
path_map={
"retry": "increment_retry",
"format_error": "format_response",
"format_response": "format_response",
}
)
# Loop back: after incrementing, re-classify
builder.add_edge("increment_retry", "classify_intent")
The retry counter lives in state, which means it persists across iterations without any global variables or closures.
Parallel Fan-Out: Running Branches Simultaneously
LangGraph supports parallel node execution when you need multiple tools to run at once and then merge results.
from langgraph.graph import StateGraph, END
from typing import Annotated
import operator
class ResearchState(TypedDict):
query: str
# operator.add reducer concatenates lists from parallel branches
web_results: Annotated[list, operator.add]
db_results: Annotated[list, operator.add]
final_answer: str | None
def web_search(state: ResearchState) -> dict:
results = run_web_search(state["query"])
return {"web_results": results}
def db_lookup(state: ResearchState) -> dict:
results = query_vector_db(state["query"])
return {"db_results": results}
def synthesize(state: ResearchState) -> dict:
all_context = state["web_results"] + state["db_results"]
answer = llm_synthesize(state["query"], all_context)
return {"final_answer": answer}
research_builder = StateGraph(ResearchState)
research_builder.add_node("web_search", web_search)
research_builder.add_node("db_lookup", db_lookup)
research_builder.add_node("synthesize", synthesize)
research_builder.set_entry_point("web_search")
# Fan-out: both branches start from the same implicit entry
# In LangGraph, parallel execution uses send() or multiple edges from a node
research_builder.add_edge("web_search", "synthesize")
research_builder.add_edge("db_lookup", "synthesize")
research_builder.add_edge("synthesize", END)
For true parallel execution (both nodes running concurrently, not sequentially), use the Send API introduced in LangGraph 0.2:
from langgraph.types import Send
def fan_out_router(state: ResearchState) -> list[Send]:
# Return a list of Send objects — LangGraph runs them in parallel
return [
Send("web_search", state),
Send("db_lookup", state),
]
research_builder.set_entry_point("fan_out")
research_builder.add_node("fan_out", lambda s: s) # passthrough node
research_builder.add_conditional_edges("fan_out", fan_out_router)
The Send API passes a copy of state to each branch. Results merge back via reducers before synthesize runs.
Dynamic Routing: Deciding Branches at Runtime
Sometimes the number of branches isn't known at build time — for example, routing to one of N tools based on LLM output. Use a router that constructs the node name dynamically.
TOOL_REGISTRY = {
"calculator": run_calculator,
"weather": get_weather,
"calendar": check_calendar,
"email": send_email,
}
# Register all tools as nodes
for tool_name, tool_fn in TOOL_REGISTRY.items():
builder.add_node(tool_name, lambda s, fn=tool_fn: {"tool_result": fn(s)})
def route_to_tool(state: AgentState) -> str:
# LLM returns the tool name — validate before routing
tool_name = extract_tool_name(state["messages"][-1])
if tool_name in TOOL_REGISTRY:
return tool_name
return "format_response" # fallback for unknown tools
builder.add_conditional_edges(
source="classify_intent",
path=route_to_tool,
path_map={name: name for name in TOOL_REGISTRY} | {"format_response": "format_response"},
)
The path_map keys must cover every possible return value. If route_to_tool returns a string not in path_map, LangGraph raises a KeyError at runtime — not at build time. Add validation in the router.
Subgraphs: Composing Complex Graphs from Smaller Ones
For agents with many responsibilities, compile each concern as its own graph and compose them.
# Build and compile the research subgraph
research_graph = research_builder.compile()
# Use it as a node in a parent graph
def research_node(state: AgentState) -> dict:
result = research_graph.invoke({"query": state["messages"][-1].content})
return {"tool_result": result["final_answer"]}
parent_builder = StateGraph(AgentState)
parent_builder.add_node("research", research_node)
Subgraphs have their own state type. The wrapping node translates between the parent and child state schemas. Keep that translation explicit — don't assume field names match.
Production Considerations
Checkpointing for long-running graphs. Add a MemorySaver or PostgresSaver checkpointer so interrupted runs can resume:
from langgraph.checkpoint.memory import MemorySaver
graph = builder.compile(checkpointer=MemorySaver())
# Each invocation with a thread_id is resumable
graph.invoke(initial_state, config={"configurable": {"thread_id": "user-123"}})
Without a checkpointer, a graph crash loses all intermediate state. In production, use PostgresSaver from langgraph-checkpoint-postgres.
Cycle detection. Graphs with loops (like retry logic) will run forever if the exit condition is never met. Always pair a loop with a counter or a time limit in state.
Node granularity. One LLM call per node is a good default. Nodes that do too much are hard to test and hard to resume from mid-failure. If a node does three things, split it into three nodes.
Testing routers in isolation. Router functions are pure functions — test them directly without spinning up the full graph:
def test_route_after_classify_tool_call():
state = AgentState(intent="tool_call", messages=[], tool_result=None, retry_count=0, error=None)
assert route_after_classify(state) == "run_tool"
Summary
- State is shared typed memory — use
Annotatedreducers for fields multiple nodes write to - Conditional edges take a router function returning the next node name as a string
- Retry loops work by returning to an earlier node; always gate them with a counter in state
- Parallel branches use
Sendfor concurrent execution and reducer annotations for safe merges - Dynamic routing constructs the node name at runtime — validate the output before routing
- Subgraphs let you compose large agents from smaller, testable graphs
- Checkpointing is not optional for production — add it before you need it
Tested on LangGraph 0.3.x, Python 3.12, LangChain Core 0.3.x