Problem: Your LangGraph Agent Works Locally — Now What?
You've built a LangGraph agent that runs perfectly on your machine. The moment you try to deploy it, you're suddenly managing Redis for checkpointing, a task queue for async runs, WebSocket servers for streaming, and Kubernetes configs for scaling. That's a full platform team's worth of work.
LangGraph Cloud takes all of that off your plate. It's LangChain's managed runtime for LangGraph agents — handling persistence, streaming, and horizontal scaling so you ship the agent, not the infrastructure.
You'll learn:
- How to package and deploy a LangGraph agent to LangGraph Cloud
- How to trigger runs, stream tokens, and retrieve thread state via the REST API
- How persistence, interrupts, and human-in-the-loop work in a managed context
Time: 20 min | Difficulty: Intermediate
Why Self-Hosting LangGraph Is Painful
LangGraph agents are stateful. Every interrupt, checkpoint, and multi-turn conversation requires a persistent store. Running this yourself means:
- Spinning up a Postgres or Redis-backed
AsyncCheckpointSaver - Managing long-running async tasks without dropping connections
- Handling partial failures mid-graph without corrupting state
- Scaling horizontally without race conditions on shared state
LangGraph Cloud replaces this with a single langgraph.json config and a langgraph deploy command.
When LangGraph Cloud makes sense:
- You want production uptime without a DevOps hire
- Your agents use
interrupt/ human-in-the-loop and need durable state - You're running LangSmith already (same ecosystem, same dashboard)
When to self-host instead:
- Data cannot leave your VPC (LangGraph Platform on-prem exists for this)
- You need custom hardware (GPU inference alongside the graph)
- Cost at scale exceeds managed pricing
Solution
Step 1: Structure Your Project for Cloud Deployment
LangGraph Cloud expects a specific layout. Start from this minimal structure:
my-agent/
├── langgraph.json # Cloud manifest
├── pyproject.toml # or requirements.txt
└── agent/
├── __init__.py
└── graph.py # Your compiled graph lives here
Your graph.py must export a compiled CompiledGraph at module level — the Cloud runtime imports it directly.
# agent/graph.py
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
step_count: int
def call_model(state: AgentState):
# Your LLM call here
return {"messages": [...], "step_count": state["step_count"] + 1}
def should_continue(state: AgentState):
if state["step_count"] >= 5:
return END
return "call_model"
# Build the graph
builder = StateGraph(AgentState)
builder.add_node("call_model", call_model)
builder.set_entry_point("call_model")
builder.add_conditional_edges("call_model", should_continue)
# Compile — checkpointer is injected by Cloud at runtime, omit it here
graph = builder.compile()
Important: Do not pass a
checkpointerwhen compiling for Cloud. LangGraph Cloud injects its own managed checkpointer. Hardcoding one will conflict with the runtime.
Step 2: Write langgraph.json
This manifest tells the Cloud runtime where your graph is, which Python version to use, and which environment variables to expect.
{
"dependencies": ["."],
"graphs": {
"my_agent": "./agent/graph.py:graph"
},
"env": ".env"
}
Field breakdown:
dependencies— paths to install (.installs your local package)graphs— map ofgraph_id→module_path:variable_nameenv— path to a.envfile for secrets (not committed to git)
For multiple agents in one repo, add more entries to graphs:
{
"graphs": {
"research_agent": "./agents/research.py:graph",
"summarizer": "./agents/summarizer.py:graph"
}
}
Step 3: Install the CLI and Authenticate
pip install langgraph-cli
# Authenticate with your LangSmith API key
export LANGSMITH_API_KEY=lsv2_...
# Verify the CLI sees your project
langgraph --version
Expected: langgraph-cli 0.1.x
The CLI uses your LANGSMITH_API_KEY for both auth and deployment. LangGraph Cloud is gated behind a LangSmith Plus or Enterprise plan.
Step 4: Test Locally with langgraph dev
Before deploying, run the Cloud server locally. This uses the same runtime as production — you catch environment issues before they hit the remote.
# Spins up the LangGraph API server at localhost:8123
langgraph dev
Expected output:
Ready!
- API: http://localhost:8123
- Docs: http://localhost:8123/docs
- LangGraph Studio: https://smith.langchain.com/studio/?baseUrl=http://localhost:8123
Open the Studio URL to visually step through your graph. This is the fastest way to debug state transitions before deploying.
If it fails:
ModuleNotFoundError→ Checkdependenciesinlanggraph.json— the package isn't installed in your environmentGraph not found→ Verify themodule_path:variableingraphsmatches your actual file path and exported variable name
Step 5: Deploy to LangGraph Cloud
langgraph deploy
The CLI packages your code, pushes it to LangGraph Cloud, and returns a deployment URL.
Deploying my_agent...
Build: success
Deployment URL: https://my-agent-abc123.us.langgraph.app
Each deploy creates a new immutable revision. The previous revision stays live until the new one passes health checks — zero-downtime by default.
To deploy to a named environment (staging vs production):
langgraph deploy --name production
Step 6: Trigger Runs via the REST API
LangGraph Cloud exposes a REST API for all graph operations. Use the Python SDK or raw HTTP.
from langgraph_sdk import get_client
client = get_client(url="https://my-agent-abc123.us.langgraph.app")
# Create a thread (persistent conversation session)
thread = await client.threads.create()
thread_id = thread["thread_id"]
# Trigger a run on the thread
run = await client.runs.create(
thread_id=thread_id,
assistant_id="my_agent",
input={"messages": [{"role": "user", "content": "Research LangGraph Cloud pricing"}]},
)
print(run["run_id"])
The thread_id is the key to stateful conversations. Every run on the same thread resumes from the last checkpoint — no extra code needed.
Step 7: Stream Tokens in Real Time
For user-facing apps, you need streaming — waiting for a full agent run to complete before showing output kills UX.
# Stream events as they happen
async for chunk in client.runs.stream(
thread_id=thread_id,
assistant_id="my_agent",
input={"messages": [{"role": "user", "content": "Summarize the findings"}]},
stream_mode="messages", # "messages" | "values" | "updates" | "events"
):
if chunk.event == "messages/partial":
# Individual token chunks
print(chunk.data["content"], end="", flush=True)
elif chunk.event == "messages/complete":
print() # newline after full message
Stream mode options:
messages— token-by-token LLM output (best for chat UIs)values— full state after each node completesupdates— only the state delta from each nodeevents— raw LangChain callback events (most verbose, useful for debugging)
Step 8: Handle Human-in-the-Loop Interrupts
If your graph uses interrupt(), Cloud pauses the run and durably persists state while it waits for your input.
# In your graph node:
from langgraph.types import interrupt
def review_node(state: AgentState):
# Execution pauses here — state is saved to the Cloud checkpointer
approval = interrupt({"question": "Approve this action?", "action": state["pending_action"]})
if approval == "yes":
return {"approved": True}
return {"approved": False}
From your application, resume the interrupted run:
# Check run status
run_status = await client.runs.get(thread_id, run_id)
# run_status["status"] == "interrupted"
# Resume with human input — passed back to interrupt() as the return value
await client.runs.resume(
thread_id=thread_id,
run_id=run_id,
input="yes",
)
The run picks up exactly where it left off — same graph node, same state. No re-running prior steps.
Verification
# Check your deployment is healthy
curl https://my-agent-abc123.us.langgraph.app/ok
You should see: {"ok": true}
Run an end-to-end smoke test via the SDK:
import asyncio
from langgraph_sdk import get_client
async def smoke_test():
client = get_client(url="https://my-agent-abc123.us.langgraph.app")
thread = await client.threads.create()
result = None
async for chunk in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="my_agent",
input={"messages": [{"role": "user", "content": "ping"}]},
stream_mode="values",
):
result = chunk
assert result is not None, "No output received"
print("Deployment healthy ✓")
asyncio.run(smoke_test())
You should see: Deployment healthy ✓ within 5 seconds.
Production Considerations
Persistence and thread limits: Each thread stores full checkpoint history. For long-running agents with many steps, state blobs grow. Use client.threads.delete(thread_id) to clean up finished sessions — there is no automatic TTL by default.
Concurrency: A single thread can only have one active run at a time — subsequent runs on the same thread queue. For burst workloads, create separate threads per task rather than funneling everything through one thread.
Secrets management: Environment variables in .env are encrypted at rest. For production, set env vars per deployment in the LangSmith dashboard rather than committing .env files — they are injected at runtime.
Cost control: Billing is per compute-second of graph execution. Long agentic loops accumulate fast. Cap runaway agents with recursion_limit:
# Cap at 25 node executions to prevent infinite loops
config = {"recursion_limit": 25}
run = await client.runs.create(
thread_id=thread_id,
assistant_id="my_agent",
input={...},
config=config,
)
What You Learned
- LangGraph Cloud removes checkpointing, streaming, and scaling complexity — the tradeoff is per-second billing and vendor coupling
langgraph.jsonis the single manifest that maps compiled graphs to the Cloud runtime; omit the checkpointer at compile time- Thread IDs are the primitive for stateful conversations — one thread per user session, one run per turn
interrupt()+runs.resume()is the cleanest human-in-the-loop pattern without polling or custom webhook infrastructure
When NOT to use LangGraph Cloud: If your agent is stateless and completes in under 2 seconds, a plain FastAPI endpoint is simpler and cheaper. LangGraph Cloud pays off when you need durable multi-turn state, reliable interrupt/resume, or want to skip building a deployment platform entirely.
Tested on LangGraph 0.2.x, langgraph-cli 0.1.x, langgraph-sdk 0.1.x, Python 3.12, Ubuntu 24.04