LangSmith Setup Guide: Observability for LangChain Apps

Set up LangSmith tracing in your LangChain app in minutes. Debug prompts, track token costs, and monitor agent runs in production.

Problem: You Can't Debug What You Can't See

Your LangChain app returns wrong answers, burns through tokens, or hangs — and you have no idea why. print() statements don't cut it when chains nest three levels deep or agents loop unexpectedly.

LangSmith gives you a full trace of every LLM call, tool invocation, and chain step. You see exactly what prompt was sent, what the model returned, how long it took, and what it cost.

You'll learn:

  • How to connect LangSmith to any LangChain app in under 5 minutes
  • How to read traces and pinpoint failures in multi-step chains
  • How to tag runs for production monitoring and cost tracking

Time: 20 min | Difficulty: Intermediate


Why This Matters

Without tracing, debugging a LangChain agent means guessing. A single user query can trigger 10+ LLM calls, and the failure might be in step 7 — a retrieval miss, a malformed tool call, or a prompt that exceeded the context window.

LangSmith captures the entire execution graph. You get:

  • Full prompt/response pairs at every step
  • Latency and token counts per call
  • Feedback and evaluation scores attached to runs
  • Datasets built from real production traces

It's built by the LangChain team, so the integration is zero-config — set three environment variables and tracing starts automatically.


Prerequisites

  • Python 3.11+
  • langchain and langchain-openai (or any LLM provider) installed
  • A LangSmith account — free tier available at smith.langchain.com

Solution

Step 1: Create a LangSmith API Key

  1. Go to smith.langchain.com and sign in
  2. Click your avatar → SettingsAPI Keys
  3. Click Create API Key, name it (e.g., dev-local), and copy it

Keep this key — you won't see it again.


Step 2: Install the LangSmith Package

# LangSmith SDK is separate from langchain core
pip install langsmith langchain langchain-openai

Verify the install:

python -c "import langsmith; print(langsmith.__version__)"

Expected output: 0.2.x or higher


Step 3: Set Environment Variables

LangSmith reads these three variables automatically — no code changes needed to start tracing.

# Add to your .env file or shell profile
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="ls__your_key_here"
export LANGCHAIN_PROJECT="my-app-dev"   # groups runs in the UI; create it on the fly

For a project using python-dotenv:

# .env
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=ls__your_key_here
LANGCHAIN_PROJECT=my-app-dev
OPENAI_API_KEY=sk-your_openai_key
# main.py — load before any langchain imports
from dotenv import load_dotenv
load_dotenv()

Step 4: Run Any LangChain Code

No code changes. Just run your existing app:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise technical assistant."),
    ("human", "{question}")
])

chain = prompt | llm

# This call is automatically traced
response = chain.invoke({"question": "What is GGUF quantization?"})
print(response.content)

Open smith.langchain.com, navigate to your project my-app-dev, and you'll see the trace appear within seconds.


Step 5: Read Your First Trace

In the LangSmith UI, click the run to expand it. You'll see a tree like this:

RunnableSequence                    450ms  312 tokens
├── ChatPromptTemplate               1ms
└── ChatOpenAI                      449ms  312 tokens
    ├── Input:  [system + human messages]
    └── Output: "GGUF (GPT-Generated Unified Format)…"

Key fields to check:

  • Latency — where is time being spent?
  • Total tokens — input + output, maps to cost
  • Error — red nodes indicate where a chain failed
  • Metadata — any custom tags you attach (covered next)

Step 6: Add Custom Metadata and Tags

Tags let you filter runs by feature, user, or environment in the LangSmith dashboard.

from langchain_core.runnables import RunnableConfig

config = RunnableConfig(
    tags=["production", "rag-pipeline"],
    metadata={
        "user_id": "user_42",
        "feature": "document-qa",
        "model_version": "v2.1"
    }
)

response = chain.invoke({"question": "What is RAG?"}, config=config)

You can now filter the LangSmith UI by tag: production or query metadata.feature = document-qa to isolate a specific flow.


Step 7: Trace LangGraph Agents

LangGraph agents trace automatically too. Each node and edge appears as a step in the trace tree.

from langgraph.graph import StateGraph, END
from typing import TypedDict

class AgentState(TypedDict):
    messages: list
    next_step: str

def call_model(state: AgentState):
    # LangSmith captures this LLM call as a child span
    llm = ChatOpenAI(model="gpt-4o-mini")
    response = llm.invoke(state["messages"])
    return {"messages": state["messages"] + [response]}

graph = StateGraph(AgentState)
graph.add_node("agent", call_model)
graph.set_entry_point("agent")
graph.add_edge("agent", END)

app = graph.compile()

# Full graph execution is traced end-to-end
result = app.invoke({"messages": [("human", "Summarize RAG in one sentence")], "next_step": ""})

In LangSmith you'll see the graph execution as nested spans: StateGraph → agent → ChatOpenAI.


Step 8: Log Feedback on Runs

Attach user feedback (thumbs up/down, scores) to runs for evaluation:

from langsmith import Client

client = Client()

# Get the run ID from the last traced call
run_id = "..."  # capture via callbacks or from the UI

client.create_feedback(
    run_id=run_id,
    key="user_rating",
    score=1.0,          # 0.0 to 1.0
    comment="Correct and concise"
)

Feedback aggregates in the LangSmith dashboard under Feedback — useful for tracking answer quality over time.


Verification

Run this script end-to-end:

import os
from dotenv import load_dotenv
load_dotenv()

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

assert os.getenv("LANGCHAIN_TRACING_V2") == "true", "Tracing not enabled"
assert os.getenv("LANGCHAIN_API_KEY"), "API key missing"

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([("human", "{input}")])
chain = prompt | llm

result = chain.invoke({"input": "Reply with: LANGSMITH OK"})
print(result.content)

You should see: LANGSMITH OK in your terminal, and a new run in your LangSmith project within 5 seconds.

If the run doesn't appear:

  • No runs in UI → Check LANGCHAIN_TRACING_V2=true (must be the string "true", not True)
  • AuthenticationError → API key is wrong or expired — regenerate it
  • Runs appear in wrong projectLANGCHAIN_PROJECT was set after imports; move load_dotenv() to the top of the file

Production Setup: Environment Separation

Use separate LangSmith projects per environment to avoid mixing dev noise with production signals:

# .env.development
LANGCHAIN_PROJECT=my-app-dev

# .env.production
LANGCHAIN_PROJECT=my-app-prod

In production, also set a sampling rate to reduce volume on high-traffic endpoints:

import langsmith

# Trace 20% of production requests
langsmith.tracing_v2_enabled = True

For full sampling control, use the @traceable decorator on individual functions:

from langsmith import traceable

@traceable(name="document-retriever", tags=["rag"])
def retrieve_documents(query: str) -> list[str]:
    # Only this function is traced, not the entire request
    return vector_store.similarity_search(query, k=5)

This is useful when you want to trace a specific slow function without tracing every LLM call in the request.


What You Learned

  • LangSmith tracing activates with three environment variables — no code changes to existing chains
  • Traces show the full execution tree: prompts, responses, latency, and token counts at every step
  • Tags and metadata let you segment runs by user, feature, or environment in the dashboard
  • LangGraph agents trace automatically — each node becomes a child span
  • Use @traceable in production to trace specific functions without full request overhead

Limitation: LangSmith's free tier retains traces for 7 days. For longer retention and team access, the Developer plan starts at $39/mo. Self-hosted LangSmith (Enterprise) is available but requires a separate license.

Tested on LangSmith 0.2.x, LangChain 0.3.x, Python 3.12, macOS and Ubuntu 24.04