LangSmith Tracing: Debug LLM Chains in Production

Set up LangSmith tracing to debug slow, hallucinating, or failing LLM chains. Pinpoint latency, token costs, and errors in minutes.

Problem: Your LLM Chain Fails in Production and You Can't See Why

Your RAG pipeline returns garbage answers. A LangChain agent loops forever. Token costs spiked 3x overnight. Without tracing, you're print()-debugging blind.

LangSmith gives you full visibility into every LLM call, tool invocation, and chain step — with latency, token counts, and inputs/outputs at each node.

You'll learn:

  • How to wire LangSmith tracing into any LangChain or custom LLM app in under 5 minutes
  • How to read a trace to find the exact step causing hallucinations or latency spikes
  • How to tag runs by environment and user so production noise doesn't pollute your evals

Time: 20 min | Difficulty: Intermediate


Why LLM Chains Are Hard to Debug Without Tracing

A typical RAG chain has 5–8 steps: query rewriting, embedding, retrieval, reranking, prompt construction, LLM call, output parsing. When the final answer is wrong, the failure could be at any of them.

Common failure modes this solves:

  • Retrieval returned the wrong chunks — the LLM had no chance of answering correctly
  • Prompt grew too long — context window overflow silently truncates your docs
  • Tool call looped — agent called the same tool 12 times before timing out
  • Latency spike — one slow embedding call is hiding inside a 9-second response

Without a trace, you see the final output. With LangSmith, you see every intermediate step.


Solution

Step 1: Create a LangSmith Account and Get Your API Key

Go to smith.langchain.com and sign up. The free Developer plan includes 5,000 traces/month — enough to instrument a side project or staging environment.

From the dashboard:

  1. Click your avatar → Settings
  2. API KeysCreate API Key
  3. Copy the key — it starts with lsv2_

Step 2: Install the SDK

# LangSmith SDK + LangChain core
pip install langsmith langchain-core langchain-openai

Verify the install:

python -c "import langsmith; print(langsmith.__version__)"
# Expected: 0.2.x or higher

Step 3: Set Environment Variables

LangSmith tracing activates through environment variables — no code changes needed for LangChain apps.

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY="lsv2_your_key_here"
export LANGCHAIN_PROJECT="my-rag-app"   # traces group under this project name
export OPENAI_API_KEY="sk-..."

For production, add these to your .env file or secrets manager:

# .env (never commit this)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=lsv2_your_key_here
LANGCHAIN_PROJECT=production
OPENAI_API_KEY=sk-...

Load with python-dotenv or your deployment platform's secrets injection.


Step 4: Run Your First Traced Chain

Any existing LangChain code now traces automatically. Here's a minimal example to confirm it works:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# No tracing-specific code needed — env vars handle it
llm = ChatOpenAI(model="gpt-4o-mini")

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer concisely."),
    ("user", "{question}")
])

chain = prompt | llm

response = chain.invoke({"question": "What is LangSmith used for?"})
print(response.content)

Run it, then open smith.langchain.com → your project. You'll see the trace within 2–3 seconds.


Step 5: Read a Trace

Click any trace in the dashboard. Here's what each section shows:

▼ RunnableSequence          [total: 1.2s | tokens: 312 | cost: $0.0002]
  ▼ ChatPromptTemplate      [0.001s]
      inputs:  { question: "What is LangSmith used for?" }
      outputs: { messages: [...] }
  ▼ ChatOpenAI              [1.19s]
      inputs:  { messages: [...] }
      outputs: { content: "LangSmith is an observability platform..." }
      model:   gpt-4o-mini
      tokens:  prompt=45 | completion=267 | total=312

What to look for when debugging:

  • High latency in a retrieval step → your vector DB query is slow or over-fetching
  • LLM input tokens near the model limit → your prompt construction is bloated
  • Unexpected None in an intermediate output → a parser or tool returned nothing and the chain silently continued
  • Repeated identical tool calls → your agent's stopping condition is broken

Step 6: Tag Runs by Environment and User

Raw production traces are noisy. Tags let you filter and compare runs.

from langsmith import traceable

# Decorator-based tracing for custom functions (non-LangChain code)
@traceable(
    name="rag-pipeline",
    tags=["production", "v2.1"],
    metadata={"user_id": "usr_abc123", "session_id": "sess_xyz"}
)
def run_rag_pipeline(query: str) -> str:
    # your retrieval + generation logic here
    return answer

For LangChain chains, pass metadata at invoke time:

response = chain.invoke(
    {"question": user_query},
    config={
        "metadata": {"user_id": user_id, "env": "production"},
        "tags": ["v2.1", "production"]
    }
)

Now in LangSmith you can filter: Tags = production, v2.1 → isolate exactly what real users are hitting.


Step 7: Set Up a Feedback Loop (Optional but High-Value)

Log thumbs up/down from users directly to the trace:

from langsmith import Client

client = Client()

# After getting a run_id from your trace (pass it through your app)
client.create_feedback(
    run_id=run_id,           # UUID from the trace
    key="user-rating",
    score=1.0,               # 1.0 = thumbs up, 0.0 = thumbs down
    comment="Answer was accurate and concise"
)

This connects user-reported quality to the exact trace — you can filter LangSmith for all thumbs-down runs and see the chain inputs that caused them.


Verification

Run your traced chain, then confirm everything is flowing:

# Quick sanity check — lists recent runs in your project
python - <<'EOF'
from langsmith import Client
client = Client()
runs = list(client.list_runs(project_name="my-rag-app", limit=5))
for run in runs:
    print(f"{run.name} | {run.status} | latency: {run.end_time - run.start_time}")
EOF

You should see: 5 recent runs with name, success or error status, and latency.

If runs don't appear in the dashboard:

  • Confirm LANGCHAIN_TRACING_V2=true is set in the same shell/process — not just exported in a different terminal
  • Check the project name — a typo creates a new project instead of appending to the existing one
  • Firewall blocking outbound to api.smith.langchain.com → whitelist port 443 to that domain

Production Patterns

Separate Projects per Environment

# staging
LANGCHAIN_PROJECT=my-app-staging

# production
LANGCHAIN_PROJECT=my-app-production

This keeps dev noise out of your production dashboards and lets you compare latency and error rates between environments.

Sample High-Traffic Endpoints

At scale, tracing every call gets expensive. Wrap tracing in a sampler:

import random
from langchain_core.runnables import RunnableConfig

def get_config(user_id: str) -> RunnableConfig:
    # Trace 10% of production traffic; always trace errors separately
    should_trace = random.random() < 0.10
    return RunnableConfig(
        tags=["production"],
        metadata={"user_id": user_id, "sampled": should_trace},
        callbacks=None if not should_trace else None  # LangSmith reads env vars
    )

For full error capture with sampling, use @traceable only around the outer function and let errors propagate — LangSmith captures the full stack trace on failure regardless of sampling.

Cost Alerts

In LangSmith dashboard: Projects → your project → Settings → Alerts. Set a daily token spend threshold. You'll get an email before a runaway loop drains your OpenAI budget.


What You Learned

  • LANGCHAIN_TRACING_V2=true + LANGCHAIN_API_KEY is all you need to start tracing LangChain apps
  • The trace tree shows latency, tokens, and inputs/outputs at every step — no print debugging needed
  • @traceable extends tracing to non-LangChain code (custom retrievers, preprocessing, etc.)
  • Tags + metadata make production traces filterable and comparable against staging
  • Feedback scores connect user-reported quality to the exact trace that produced it

When not to use LangSmith tracing: For very high-volume, latency-sensitive endpoints (>1000 req/min), evaluate Langfuse or Helicone as self-hosted alternatives — LangSmith's managed API adds ~20–50ms per trace submission.

Tested on LangSmith SDK 0.2.x, LangChain Core 0.3.x, Python 3.12, gpt-4o-mini