Problem: LangGraph Agents Work Locally But Have No Production Path
You've built a LangGraph agent that runs fine with python graph.py. Now you need it running as a real API — with endpoints, health checks, and a container you can actually deploy.
LangServe gives you FastAPI routes from any LangChain/LangGraph runnable. Docker packages the whole thing into a deployable unit. Together, they're the fastest path from local graph to production endpoint.
You'll learn:
- How to wrap a LangGraph graph with LangServe
- How to write a production-grade Dockerfile for it
- How to configure environment variables, health checks, and streaming endpoints
Time: 25 min | Difficulty: Intermediate
Why This Happens
LangGraph graphs are Python objects — they have no HTTP interface by default. LangServe solves this by treating any Runnable as an API route. But without a proper container setup, you're still stuck running it manually on a server.
Symptoms of the missing-deployment problem:
- No way to call your agent from a frontend or another service
- Secrets hardcoded in the script
- Agent crashes silently with no health monitoring
Solution
Step 1: Install Dependencies
# Use uv for fast, reproducible installs
uv init langgraph-api && cd langgraph-api
# Core deps
uv add langgraph langserve[all] fastapi uvicorn langchain-openai python-dotenv
# Verify LangServe is available
python -c "import langserve; print(langserve.__version__)"
Expected output: 0.3.x or higher
Step 2: Build Your LangGraph Graph
Create graph.py — a minimal ReAct-style agent as the example:
# graph.py
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.messages import BaseMessage
from typing import TypedDict, Annotated, Sequence
import operator
class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], operator.add]
def build_graph() -> StateGraph:
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def call_model(state: AgentState):
response = llm.invoke(state["messages"])
return {"messages": [response]}
def should_continue(state: AgentState):
last = state["messages"][-1]
# END if no tool calls — keeps the graph from looping forever
return "end" if not getattr(last, "tool_calls", None) else "continue"
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.set_entry_point("agent")
workflow.add_conditional_edges(
"agent",
should_continue,
{"continue": "agent", "end": END},
)
return workflow.compile()
graph = build_graph()
Step 3: Create the LangServe App
# app.py
from fastapi import FastAPI
from langserve import add_routes
from graph import graph
from dotenv import load_dotenv
load_dotenv() # reads .env at startup — never hardcode keys
app = FastAPI(
title="LangGraph Agent API",
description="Production LangGraph agent via LangServe",
version="1.0.0",
)
# /agent/invoke — single call, waits for full response
# /agent/stream — SSE stream, tokens as they arrive
# /agent/batch — multiple inputs in one request
add_routes(
app,
graph,
path="/agent",
# Disable playground in production to avoid exposing internals
playground_type="default",
)
@app.get("/health")
def health():
# Used by Docker and load balancers to confirm the service is alive
return {"status": "ok"}
Run locally to confirm routes work:
uvicorn app:app --reload --port 8000
Open http://localhost:8000/docs — you should see /agent/invoke, /agent/stream, and /agent/batch.
Step 4: Write the Dockerfile
# Dockerfile
# Pin to a specific Python version — "latest" breaks builds silently
FROM python:3.12-slim
WORKDIR /app
# Install uv inside the container for fast, deterministic installs
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
# Copy dependency files first — Docker caches this layer
# Re-runs only when pyproject.toml or uv.lock changes, not on every code edit
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-cache
# Copy source after deps — keeps the slow layer cached during dev
COPY . .
# Run as non-root for security
RUN useradd --create-home appuser && chown -R appuser /app
USER appuser
EXPOSE 8000
# Use exec form so SIGTERM reaches uvicorn directly (graceful shutdown)
CMD ["uv", "run", "uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
Step 5: Configure docker-compose for Local Production Testing
# docker-compose.yml
services:
agent:
build: .
ports:
- "8000:8000"
env_file:
- .env # OPENAI_API_KEY and any other secrets live here
healthcheck:
# Docker restarts the container if /health stops responding
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
restart: unless-stopped
Create .env (never commit this file):
# .env
OPENAI_API_KEY=sk-...
LANGCHAIN_TRACING_V2=true # optional: enable LangSmith tracing
LANGCHAIN_API_KEY=ls__... # optional: LangSmith key
LANGCHAIN_PROJECT=langgraph-prod
Step 6: Build and Run
# Build the image
docker compose build
# Start in detached mode
docker compose up -d
# Confirm it's healthy
docker compose ps
Expected output:
NAME STATUS
agent-agent-1 Up 2 minutes (healthy)
Verification
Test the invoke endpoint:
curl -X POST http://localhost:8000/agent/invoke \
-H "Content-Type: application/json" \
-d '{
"input": {
"messages": [{"role": "human", "content": "What is 12 * 17?"}]
}
}'
You should see: A JSON response with output.messages containing the agent's reply.
Test streaming:
curl -N http://localhost:8000/agent/stream \
-H "Content-Type: application/json" \
-d '{
"input": {
"messages": [{"role": "human", "content": "Summarize the Rust ownership model"}]
}
}'
You should see: A sequence of data: SSE lines arriving token by token.
Check health:
curl http://localhost:8000/health
# {"status":"ok"}
Production Considerations
Workers: --workers 2 works for most deployments. Set to (2 × CPU cores) + 1 for CPU-bound workloads. LLM calls are I/O-bound, so 2–4 workers per container is usually enough.
Secrets: Never bake OPENAI_API_KEY into the image. Use env_file, Kubernetes secrets, or a secrets manager like AWS Secrets Manager at runtime.
Timeouts: LLM calls can take 30–60 seconds. Set --timeout-keep-alive 65 in uvicorn to avoid load balancer timeouts cutting off long responses.
Disable the playground in production:
# app.py — add this to block the /agent/playground route
add_routes(app, graph, path="/agent", disabled_endpoints=["playground"])
What You Learned
- LangServe turns any LangGraph
Runnableinto/invoke,/stream, and/batchendpoints automatically - Layering
COPYin your Dockerfile (deps before source) keeps builds fast during iteration - The
/healthendpoint is required — without it, Docker and Kubernetes can't detect a crashed container --workers 2is the safe default; tune up only after profiling actual request latency
Limitation: LangServe doesn't handle agent memory persistence across sessions out of the box. For stateful multi-turn agents, add a checkpointer (e.g., SqliteSaver for dev, PostgresSaver for production) to your graph before wrapping it.
Tested on LangGraph 0.2.x, LangServe 0.3.x, Python 3.12, Docker 27, Ubuntu 24.04