Deploy LangGraph with LangServe and Docker: Production Setup 2026

Deploy a LangGraph agent to production using LangServe and Docker. Covers API setup, containerization, health checks, and environment config.

Problem: LangGraph Agents Work Locally But Have No Production Path

You've built a LangGraph agent that runs fine with python graph.py. Now you need it running as a real API — with endpoints, health checks, and a container you can actually deploy.

LangServe gives you FastAPI routes from any LangChain/LangGraph runnable. Docker packages the whole thing into a deployable unit. Together, they're the fastest path from local graph to production endpoint.

You'll learn:

  • How to wrap a LangGraph graph with LangServe
  • How to write a production-grade Dockerfile for it
  • How to configure environment variables, health checks, and streaming endpoints

Time: 25 min | Difficulty: Intermediate


Why This Happens

LangGraph graphs are Python objects — they have no HTTP interface by default. LangServe solves this by treating any Runnable as an API route. But without a proper container setup, you're still stuck running it manually on a server.

Symptoms of the missing-deployment problem:

  • No way to call your agent from a frontend or another service
  • Secrets hardcoded in the script
  • Agent crashes silently with no health monitoring

Solution

Step 1: Install Dependencies

# Use uv for fast, reproducible installs
uv init langgraph-api && cd langgraph-api

# Core deps
uv add langgraph langserve[all] fastapi uvicorn langchain-openai python-dotenv

# Verify LangServe is available
python -c "import langserve; print(langserve.__version__)"

Expected output: 0.3.x or higher


Step 2: Build Your LangGraph Graph

Create graph.py — a minimal ReAct-style agent as the example:

# graph.py
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.messages import BaseMessage
from typing import TypedDict, Annotated, Sequence
import operator


class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]


def build_graph() -> StateGraph:
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    def call_model(state: AgentState):
        response = llm.invoke(state["messages"])
        return {"messages": [response]}

    def should_continue(state: AgentState):
        last = state["messages"][-1]
        # END if no tool calls — keeps the graph from looping forever
        return "end" if not getattr(last, "tool_calls", None) else "continue"

    workflow = StateGraph(AgentState)
    workflow.add_node("agent", call_model)
    workflow.set_entry_point("agent")
    workflow.add_conditional_edges(
        "agent",
        should_continue,
        {"continue": "agent", "end": END},
    )

    return workflow.compile()


graph = build_graph()

Step 3: Create the LangServe App

# app.py
from fastapi import FastAPI
from langserve import add_routes
from graph import graph
from dotenv import load_dotenv

load_dotenv()  # reads .env at startup — never hardcode keys

app = FastAPI(
    title="LangGraph Agent API",
    description="Production LangGraph agent via LangServe",
    version="1.0.0",
)

# /agent/invoke  — single call, waits for full response
# /agent/stream  — SSE stream, tokens as they arrive
# /agent/batch   — multiple inputs in one request
add_routes(
    app,
    graph,
    path="/agent",
    # Disable playground in production to avoid exposing internals
    playground_type="default",
)


@app.get("/health")
def health():
    # Used by Docker and load balancers to confirm the service is alive
    return {"status": "ok"}

Run locally to confirm routes work:

uvicorn app:app --reload --port 8000

Open http://localhost:8000/docs — you should see /agent/invoke, /agent/stream, and /agent/batch.


Step 4: Write the Dockerfile

# Dockerfile
# Pin to a specific Python version — "latest" breaks builds silently
FROM python:3.12-slim

WORKDIR /app

# Install uv inside the container for fast, deterministic installs
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv

# Copy dependency files first — Docker caches this layer
# Re-runs only when pyproject.toml or uv.lock changes, not on every code edit
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-cache

# Copy source after deps — keeps the slow layer cached during dev
COPY . .

# Run as non-root for security
RUN useradd --create-home appuser && chown -R appuser /app
USER appuser

EXPOSE 8000

# Use exec form so SIGTERM reaches uvicorn directly (graceful shutdown)
CMD ["uv", "run", "uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

Step 5: Configure docker-compose for Local Production Testing

# docker-compose.yml
services:
  agent:
    build: .
    ports:
      - "8000:8000"
    env_file:
      - .env           # OPENAI_API_KEY and any other secrets live here
    healthcheck:
      # Docker restarts the container if /health stops responding
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
    restart: unless-stopped

Create .env (never commit this file):

# .env
OPENAI_API_KEY=sk-...
LANGCHAIN_TRACING_V2=true       # optional: enable LangSmith tracing
LANGCHAIN_API_KEY=ls__...       # optional: LangSmith key
LANGCHAIN_PROJECT=langgraph-prod

Step 6: Build and Run

# Build the image
docker compose build

# Start in detached mode
docker compose up -d

# Confirm it's healthy
docker compose ps

Expected output:

NAME            STATUS
agent-agent-1   Up 2 minutes (healthy)

Verification

Test the invoke endpoint:

curl -X POST http://localhost:8000/agent/invoke \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "messages": [{"role": "human", "content": "What is 12 * 17?"}]
    }
  }'

You should see: A JSON response with output.messages containing the agent's reply.

Test streaming:

curl -N http://localhost:8000/agent/stream \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "messages": [{"role": "human", "content": "Summarize the Rust ownership model"}]
    }
  }'

You should see: A sequence of data: SSE lines arriving token by token.

Check health:

curl http://localhost:8000/health
# {"status":"ok"}

Production Considerations

Workers: --workers 2 works for most deployments. Set to (2 × CPU cores) + 1 for CPU-bound workloads. LLM calls are I/O-bound, so 2–4 workers per container is usually enough.

Secrets: Never bake OPENAI_API_KEY into the image. Use env_file, Kubernetes secrets, or a secrets manager like AWS Secrets Manager at runtime.

Timeouts: LLM calls can take 30–60 seconds. Set --timeout-keep-alive 65 in uvicorn to avoid load balancer timeouts cutting off long responses.

Disable the playground in production:

# app.py — add this to block the /agent/playground route
add_routes(app, graph, path="/agent", disabled_endpoints=["playground"])

What You Learned

  • LangServe turns any LangGraph Runnable into /invoke, /stream, and /batch endpoints automatically
  • Layering COPY in your Dockerfile (deps before source) keeps builds fast during iteration
  • The /health endpoint is required — without it, Docker and Kubernetes can't detect a crashed container
  • --workers 2 is the safe default; tune up only after profiling actual request latency

Limitation: LangServe doesn't handle agent memory persistence across sessions out of the box. For stateful multi-turn agents, add a checkpointer (e.g., SqliteSaver for dev, PostgresSaver for production) to your graph before wrapping it.

Tested on LangGraph 0.2.x, LangServe 0.3.x, Python 3.12, Docker 27, Ubuntu 24.04