Problem: Your LLM Chain Fails in Production and You Can't See Why
Your RAG pipeline returns garbage answers. A LangChain agent loops forever. Token costs spiked 3x overnight. Without tracing, you're print()-debugging blind.
LangSmith gives you full visibility into every LLM call, tool invocation, and chain step — with latency, token counts, and inputs/outputs at each node.
You'll learn:
- How to wire LangSmith tracing into any LangChain or custom LLM app in under 5 minutes
- How to read a trace to find the exact step causing hallucinations or latency spikes
- How to tag runs by environment and user so production noise doesn't pollute your evals
Time: 20 min | Difficulty: Intermediate
Why LLM Chains Are Hard to Debug Without Tracing
A typical RAG chain has 5–8 steps: query rewriting, embedding, retrieval, reranking, prompt construction, LLM call, output parsing. When the final answer is wrong, the failure could be at any of them.
Common failure modes this solves:
- Retrieval returned the wrong chunks — the LLM had no chance of answering correctly
- Prompt grew too long — context window overflow silently truncates your docs
- Tool call looped — agent called the same tool 12 times before timing out
- Latency spike — one slow embedding call is hiding inside a 9-second response
Without a trace, you see the final output. With LangSmith, you see every intermediate step.
Solution
Step 1: Create a LangSmith Account and Get Your API Key
Go to smith.langchain.com and sign up. The free Developer plan includes 5,000 traces/month — enough to instrument a side project or staging environment.
From the dashboard:
- Click your avatar → Settings
- API Keys → Create API Key
- Copy the key — it starts with
lsv2_
Step 2: Install the SDK
# LangSmith SDK + LangChain core
pip install langsmith langchain-core langchain-openai
Verify the install:
python -c "import langsmith; print(langsmith.__version__)"
# Expected: 0.2.x or higher
Step 3: Set Environment Variables
LangSmith tracing activates through environment variables — no code changes needed for LangChain apps.
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY="lsv2_your_key_here"
export LANGCHAIN_PROJECT="my-rag-app" # traces group under this project name
export OPENAI_API_KEY="sk-..."
For production, add these to your .env file or secrets manager:
# .env (never commit this)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=lsv2_your_key_here
LANGCHAIN_PROJECT=production
OPENAI_API_KEY=sk-...
Load with python-dotenv or your deployment platform's secrets injection.
Step 4: Run Your First Traced Chain
Any existing LangChain code now traces automatically. Here's a minimal example to confirm it works:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
# No tracing-specific code needed — env vars handle it
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([
("system", "Answer concisely."),
("user", "{question}")
])
chain = prompt | llm
response = chain.invoke({"question": "What is LangSmith used for?"})
print(response.content)
Run it, then open smith.langchain.com → your project. You'll see the trace within 2–3 seconds.
Step 5: Read a Trace
Click any trace in the dashboard. Here's what each section shows:
▼ RunnableSequence [total: 1.2s | tokens: 312 | cost: $0.0002]
▼ ChatPromptTemplate [0.001s]
inputs: { question: "What is LangSmith used for?" }
outputs: { messages: [...] }
▼ ChatOpenAI [1.19s]
inputs: { messages: [...] }
outputs: { content: "LangSmith is an observability platform..." }
model: gpt-4o-mini
tokens: prompt=45 | completion=267 | total=312
What to look for when debugging:
- High latency in a retrieval step → your vector DB query is slow or over-fetching
- LLM input tokens near the model limit → your prompt construction is bloated
- Unexpected
Nonein an intermediate output → a parser or tool returned nothing and the chain silently continued - Repeated identical tool calls → your agent's stopping condition is broken
Step 6: Tag Runs by Environment and User
Raw production traces are noisy. Tags let you filter and compare runs.
from langsmith import traceable
# Decorator-based tracing for custom functions (non-LangChain code)
@traceable(
name="rag-pipeline",
tags=["production", "v2.1"],
metadata={"user_id": "usr_abc123", "session_id": "sess_xyz"}
)
def run_rag_pipeline(query: str) -> str:
# your retrieval + generation logic here
return answer
For LangChain chains, pass metadata at invoke time:
response = chain.invoke(
{"question": user_query},
config={
"metadata": {"user_id": user_id, "env": "production"},
"tags": ["v2.1", "production"]
}
)
Now in LangSmith you can filter: Tags = production, v2.1 → isolate exactly what real users are hitting.
Step 7: Set Up a Feedback Loop (Optional but High-Value)
Log thumbs up/down from users directly to the trace:
from langsmith import Client
client = Client()
# After getting a run_id from your trace (pass it through your app)
client.create_feedback(
run_id=run_id, # UUID from the trace
key="user-rating",
score=1.0, # 1.0 = thumbs up, 0.0 = thumbs down
comment="Answer was accurate and concise"
)
This connects user-reported quality to the exact trace — you can filter LangSmith for all thumbs-down runs and see the chain inputs that caused them.
Verification
Run your traced chain, then confirm everything is flowing:
# Quick sanity check — lists recent runs in your project
python - <<'EOF'
from langsmith import Client
client = Client()
runs = list(client.list_runs(project_name="my-rag-app", limit=5))
for run in runs:
print(f"{run.name} | {run.status} | latency: {run.end_time - run.start_time}")
EOF
You should see: 5 recent runs with name, success or error status, and latency.
If runs don't appear in the dashboard:
- Confirm
LANGCHAIN_TRACING_V2=trueis set in the same shell/process — not just exported in a different terminal - Check the project name — a typo creates a new project instead of appending to the existing one
- Firewall blocking outbound to
api.smith.langchain.com→ whitelist port 443 to that domain
Production Patterns
Separate Projects per Environment
# staging
LANGCHAIN_PROJECT=my-app-staging
# production
LANGCHAIN_PROJECT=my-app-production
This keeps dev noise out of your production dashboards and lets you compare latency and error rates between environments.
Sample High-Traffic Endpoints
At scale, tracing every call gets expensive. Wrap tracing in a sampler:
import random
from langchain_core.runnables import RunnableConfig
def get_config(user_id: str) -> RunnableConfig:
# Trace 10% of production traffic; always trace errors separately
should_trace = random.random() < 0.10
return RunnableConfig(
tags=["production"],
metadata={"user_id": user_id, "sampled": should_trace},
callbacks=None if not should_trace else None # LangSmith reads env vars
)
For full error capture with sampling, use @traceable only around the outer function and let errors propagate — LangSmith captures the full stack trace on failure regardless of sampling.
Cost Alerts
In LangSmith dashboard: Projects → your project → Settings → Alerts. Set a daily token spend threshold. You'll get an email before a runaway loop drains your OpenAI budget.
What You Learned
LANGCHAIN_TRACING_V2=true+LANGCHAIN_API_KEYis all you need to start tracing LangChain apps- The trace tree shows latency, tokens, and inputs/outputs at every step — no print debugging needed
@traceableextends tracing to non-LangChain code (custom retrievers, preprocessing, etc.)- Tags + metadata make production traces filterable and comparable against staging
- Feedback scores connect user-reported quality to the exact trace that produced it
When not to use LangSmith tracing: For very high-volume, latency-sensitive endpoints (>1000 req/min), evaluate Langfuse or Helicone as self-hosted alternatives — LangSmith's managed API adds ~20–50ms per trace submission.
Tested on LangSmith SDK 0.2.x, LangChain Core 0.3.x, Python 3.12, gpt-4o-mini