LangGraph Cloud: Managed Deployment for Agent Workflows

Problem: Your LangGraph Agent Works Locally — Now What?

You've built a LangGraph agent that runs perfectly on your machine. The moment you try to deploy it, you're suddenly managing Redis for checkpointing, a task queue for async runs, WebSocket servers for streaming, and Kubernetes configs for scaling. That's a full platform team's worth of work.

LangGraph Cloud takes all of that off your plate. It's LangChain's managed runtime for LangGraph agents — handling persistence, streaming, and horizontal scaling so you ship the agent, not the infrastructure.

You'll learn:

How to package and deploy a LangGraph agent to LangGraph Cloud
How to trigger runs, stream tokens, and retrieve thread state via the REST API
How persistence, interrupts, and human-in-the-loop work in a managed context

Time: 20 min | Difficulty: Intermediate

Why Self-Hosting LangGraph Is Painful

LangGraph agents are stateful. Every interrupt, checkpoint, and multi-turn conversation requires a persistent store. Running this yourself means:

Spinning up a Postgres or Redis-backed AsyncCheckpointSaver
Managing long-running async tasks without dropping connections
Handling partial failures mid-graph without corrupting state
Scaling horizontally without race conditions on shared state

LangGraph Cloud replaces this with a single langgraph.json config and a langgraph deploy command.

When LangGraph Cloud makes sense:

You want production uptime without a DevOps hire
Your agents use interrupt / human-in-the-loop and need durable state
You're running LangSmith already (same ecosystem, same dashboard)

When to self-host instead:

Data cannot leave your VPC (LangGraph Platform on-prem exists for this)
You need custom hardware (GPU inference alongside the graph)
Cost at scale exceeds managed pricing

Solution

Step 1: Structure Your Project for Cloud Deployment

LangGraph Cloud expects a specific layout. Start from this minimal structure:

my-agent/
├── langgraph.json       # Cloud manifest
├── pyproject.toml       # or requirements.txt
└── agent/
    ├── __init__.py
    └── graph.py         # Your compiled graph lives here

Your graph.py must export a compiled CompiledGraph at module level — the Cloud runtime imports it directly.

# agent/graph.py
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    step_count: int

def call_model(state: AgentState):
    # Your LLM call here
    return {"messages": [...], "step_count": state["step_count"] + 1}

def should_continue(state: AgentState):
    if state["step_count"] >= 5:
        return END
    return "call_model"

# Build the graph
builder = StateGraph(AgentState)
builder.add_node("call_model", call_model)
builder.set_entry_point("call_model")
builder.add_conditional_edges("call_model", should_continue)

# Compile — checkpointer is injected by Cloud at runtime, omit it here
graph = builder.compile()

Important: Do not pass a checkpointer when compiling for Cloud. LangGraph Cloud injects its own managed checkpointer. Hardcoding one will conflict with the runtime.

Step 2: Write `langgraph.json`

This manifest tells the Cloud runtime where your graph is, which Python version to use, and which environment variables to expect.

{
  "dependencies": ["."],
  "graphs": {
    "my_agent": "./agent/graph.py:graph"
  },
  "env": ".env"
}

Field breakdown:

dependencies — paths to install (. installs your local package)
graphs — map of graph_id → module_path:variable_name
env — path to a .env file for secrets (not committed to git)

For multiple agents in one repo, add more entries to graphs:

{
  "graphs": {
    "research_agent": "./agents/research.py:graph",
    "summarizer": "./agents/summarizer.py:graph"
  }
}

Step 3: Install the CLI and Authenticate

pip install langgraph-cli

# Authenticate with your LangSmith API key
export LANGSMITH_API_KEY=lsv2_...

# Verify the CLI sees your project
langgraph --version

Expected: langgraph-cli 0.1.x

The CLI uses your LANGSMITH_API_KEY for both auth and deployment. LangGraph Cloud is gated behind a LangSmith Plus or Enterprise plan.

Step 4: Test Locally with `langgraph dev`

Before deploying, run the Cloud server locally. This uses the same runtime as production — you catch environment issues before they hit the remote.

# Spins up the LangGraph API server at localhost:8123
langgraph dev

Expected output:

Ready!
- API: http://localhost:8123
- Docs: http://localhost:8123/docs
- LangGraph Studio: https://smith.langchain.com/studio/?baseUrl=http://localhost:8123

Open the Studio URL to visually step through your graph. This is the fastest way to debug state transitions before deploying.

If it fails:

ModuleNotFoundError → Check dependencies in langgraph.json — the package isn't installed in your environment
Graph not found → Verify the module_path:variable in graphs matches your actual file path and exported variable name

Step 5: Deploy to LangGraph Cloud

langgraph deploy

The CLI packages your code, pushes it to LangGraph Cloud, and returns a deployment URL.

Deploying my_agent...
Build: success
Deployment URL: https://my-agent-abc123.us.langgraph.app

Each deploy creates a new immutable revision. The previous revision stays live until the new one passes health checks — zero-downtime by default.

To deploy to a named environment (staging vs production):

langgraph deploy --name production

Step 6: Trigger Runs via the REST API

LangGraph Cloud exposes a REST API for all graph operations. Use the Python SDK or raw HTTP.

from langgraph_sdk import get_client

client = get_client(url="https://my-agent-abc123.us.langgraph.app")

# Create a thread (persistent conversation session)
thread = await client.threads.create()
thread_id = thread["thread_id"]

# Trigger a run on the thread
run = await client.runs.create(
    thread_id=thread_id,
    assistant_id="my_agent",
    input={"messages": [{"role": "user", "content": "Research LangGraph Cloud pricing"}]},
)

print(run["run_id"])

The thread_id is the key to stateful conversations. Every run on the same thread resumes from the last checkpoint — no extra code needed.

Step 7: Stream Tokens in Real Time

For user-facing apps, you need streaming — waiting for a full agent run to complete before showing output kills UX.

# Stream events as they happen
async for chunk in client.runs.stream(
    thread_id=thread_id,
    assistant_id="my_agent",
    input={"messages": [{"role": "user", "content": "Summarize the findings"}]},
    stream_mode="messages",  # "messages" | "values" | "updates" | "events"
):
    if chunk.event == "messages/partial":
        # Individual token chunks
        print(chunk.data["content"], end="", flush=True)
    elif chunk.event == "messages/complete":
        print()  # newline after full message

Stream mode options:

messages — token-by-token LLM output (best for chat UIs)
values — full state after each node completes
updates — only the state delta from each node
events — raw LangChain callback events (most verbose, useful for debugging)

Step 8: Handle Human-in-the-Loop Interrupts

If your graph uses interrupt(), Cloud pauses the run and durably persists state while it waits for your input.

# In your graph node:
from langgraph.types import interrupt

def review_node(state: AgentState):
    # Execution pauses here — state is saved to the Cloud checkpointer
    approval = interrupt({"question": "Approve this action?", "action": state["pending_action"]})

    if approval == "yes":
        return {"approved": True}
    return {"approved": False}

From your application, resume the interrupted run:

# Check run status
run_status = await client.runs.get(thread_id, run_id)
# run_status["status"] == "interrupted"

# Resume with human input — passed back to interrupt() as the return value
await client.runs.resume(
    thread_id=thread_id,
    run_id=run_id,
    input="yes",
)

The run picks up exactly where it left off — same graph node, same state. No re-running prior steps.

Verification

# Check your deployment is healthy
curl https://my-agent-abc123.us.langgraph.app/ok

You should see: {"ok": true}

Run an end-to-end smoke test via the SDK:

import asyncio
from langgraph_sdk import get_client

async def smoke_test():
    client = get_client(url="https://my-agent-abc123.us.langgraph.app")
    thread = await client.threads.create()

    result = None
    async for chunk in client.runs.stream(
        thread_id=thread["thread_id"],
        assistant_id="my_agent",
        input={"messages": [{"role": "user", "content": "ping"}]},
        stream_mode="values",
    ):
        result = chunk

    assert result is not None, "No output received"
    print("Deployment healthy ✓")

asyncio.run(smoke_test())

You should see: Deployment healthy ✓ within 5 seconds.

Production Considerations

Persistence and thread limits: Each thread stores full checkpoint history. For long-running agents with many steps, state blobs grow. Use client.threads.delete(thread_id) to clean up finished sessions — there is no automatic TTL by default.

Concurrency: A single thread can only have one active run at a time — subsequent runs on the same thread queue. For burst workloads, create separate threads per task rather than funneling everything through one thread.

Secrets management: Environment variables in .env are encrypted at rest. For production, set env vars per deployment in the LangSmith dashboard rather than committing .env files — they are injected at runtime.

Cost control: Billing is per compute-second of graph execution. Long agentic loops accumulate fast. Cap runaway agents with recursion_limit:

# Cap at 25 node executions to prevent infinite loops
config = {"recursion_limit": 25}

run = await client.runs.create(
    thread_id=thread_id,
    assistant_id="my_agent",
    input={...},
    config=config,
)

What You Learned

LangGraph Cloud removes checkpointing, streaming, and scaling complexity — the tradeoff is per-second billing and vendor coupling
langgraph.json is the single manifest that maps compiled graphs to the Cloud runtime; omit the checkpointer at compile time
Thread IDs are the primitive for stateful conversations — one thread per user session, one run per turn
interrupt() + runs.resume() is the cleanest human-in-the-loop pattern without polling or custom webhook infrastructure

When NOT to use LangGraph Cloud: If your agent is stateless and completes in under 2 seconds, a plain FastAPI endpoint is simpler and cheaper. LangGraph Cloud pays off when you need durable multi-turn state, reliable interrupt/resume, or want to skip building a deployment platform entirely.

Tested on LangGraph 0.2.x, langgraph-cli 0.1.x, langgraph-sdk 0.1.x, Python 3.12, Ubuntu 24.04