Problem: Your LLM App Is in Production and You're Flying Blind
You deployed your LLM app. Latency spikes. Costs creep up. A prompt change silently breaks outputs. You find out from a user complaint — not an alert.
LangSmith has production monitoring built in, but the defaults won't save you. You need custom dashboards tracking the metrics that matter and alerts that fire before users notice.
You'll learn:
- How to configure LangSmith dashboards for latency, cost, error rate, and token usage
- How to set up threshold alerts that notify via Slack or webhook
- How to tag traces for filtering so dashboards actually reflect your production workloads
Time: 25 min | Difficulty: Intermediate
Why Default LangSmith Tracing Isn't Enough
LangSmith traces every LLM call automatically once you add the SDK. That's useful for debugging. But raw traces don't tell you:
- Whether p95 latency crossed your SLA threshold at 2am
- Which chain is responsible for 40% of your monthly token spend
- When error rate on a specific prompt template jumped from 1% to 12%
Dashboards and alerts convert raw traces into signals. Here's how to build them properly.
Solution
Step 1: Instrument Your Code with Metadata Tags
Before dashboards mean anything, your traces need consistent metadata. Add tags and metadata to every run so you can filter by feature, model, environment, and user segment.
from langsmith import traceable
# ✅ Tag traces so dashboard filters work in production
@traceable(
name="answer-question",
tags=["feature:qa", "env:production"],
metadata={"model": "gpt-4o", "version": "v2.1", "user_tier": "pro"}
)
def answer_question(question: str) -> str:
# your chain logic here
return response
For LangChain chains, pass metadata at invocation time:
chain.invoke(
{"question": user_question},
config={
"tags": ["feature:qa", "env:production"],
"metadata": {"user_id": user_id, "session_id": session_id}
}
)
Rule: Every production invocation needs at minimum env:production and a feature: tag. Without this, your dashboards mix dev noise into production metrics.
Step 2: Create a Production Dashboard
Go to LangSmith → Projects → [your project] → Dashboards → New Dashboard.
Build four panels — these cover 90% of what production LLM apps need:
Panel 1 — Latency Percentiles
- Chart type: Line chart
- Metric:
run_latency_ms - Aggregation: p50, p95, p99
- Filter:
tags contains env:production - Group by:
name(chain name) - Time range: Rolling 24h
Panel 2 — Token Cost Over Time
- Chart type: Area chart
- Metric:
total_cost - Aggregation: Sum
- Filter:
tags contains env:production - Group by:
metadata.model - Time range: Rolling 7 days
Panel 3 — Error Rate
- Chart type: Bar chart
- Metric:
error_rate - Aggregation: Count (errors) / Count (total)
- Filter:
tags contains env:production - Group by:
name - Time range: Rolling 1h
Panel 4 — Token Usage by Feature
- Chart type: Stacked bar
- Metric:
total_tokens - Aggregation: Sum
- Filter:
tags contains env:production - Group by:
metadata.feature(from your tags) - Time range: Rolling 24h
Save the dashboard as "Production Overview" and pin it to the project.
Step 3: Configure Threshold Alerts
Navigate to LangSmith → Projects → [your project] → Alerts → New Alert Rule.
Set up these four rules. Each targets a different failure mode.
Alert 1 — Latency SLA Breach
Name: P95 Latency > 5s
Metric: run_latency_ms (p95)
Threshold: > 5000
Window: 5 minutes
Filter: tags contains env:production
Severity: Critical
Alert 2 — Error Rate Spike
Name: Error Rate > 5%
Metric: error_rate
Threshold: > 0.05
Window: 10 minutes
Filter: tags contains env:production
Severity: High
Alert 3 — Cost Anomaly
Name: Hourly Cost > $10
Metric: total_cost (sum)
Threshold: > 10.00
Window: 60 minutes
Filter: tags contains env:production
Severity: Medium
Alert 4 — Token Budget
Name: Avg Tokens per Run > 4000
Metric: total_tokens (mean)
Threshold: > 4000
Window: 15 minutes
Filter: tags contains env:production AND name = "answer-question"
Severity: Low
Step 4: Route Alerts to Slack
LangSmith alert rules support webhook delivery. Create a Slack incoming webhook, then attach it to each alert rule.
# LangSmith doesn't have a Python SDK for alert config yet — use the REST API
import httpx
LANGSMITH_API_KEY = "ls__your_key_here"
PROJECT_ID = "your-project-id"
webhook_payload = {
"name": "slack-production-alerts",
"type": "webhook",
"config": {
"url": "https://hooks.slack.com/services/T00/B00/your-webhook-token",
"headers": {"Content-Type": "application/json"},
# LangSmith sends { "alert_name", "metric_value", "threshold", "run_url" }
}
}
response = httpx.post(
f"https://api.smith.langchain.com/api/v1/projects/{PROJECT_ID}/alert-channels",
headers={"x-api-key": LANGSMITH_API_KEY},
json=webhook_payload,
)
print(response.json()) # {"id": "channel-uuid", "name": "slack-production-alerts"}
Once the channel is created, assign it to each alert rule in the UI under Alert Rules → Edit → Notification Channels.
Expected Slack message format:
🚨 [Critical] P95 Latency > 5s
Project: my-llm-app
Value: 7,340ms (threshold: 5,000ms)
Window: 2026-03-09 14:05 → 14:10 UTC
View runs: https://smith.langchain.com/...
Step 5: Add a Feedback Signal to Traces
Alerts on latency and cost catch infrastructure problems. They won't catch silent quality regressions — when outputs become subtly worse.
Wire up user feedback so you can track thumbs-up rate in dashboards:
from langsmith import Client
client = Client()
def record_user_feedback(run_id: str, score: int, comment: str = ""):
# score: 1 = positive, 0 = negative
client.create_feedback(
run_id=run_id,
key="user_thumbs",
score=score,
comment=comment,
source_info={"source": "webapp", "version": "v2.1"}
)
Call this from your API endpoint when the user rates a response:
@app.post("/feedback")
async def submit_feedback(run_id: str, rating: int):
record_user_feedback(run_id=run_id, score=rating)
return {"status": "ok"}
Add a Panel 5 — Feedback Score to your dashboard:
- Metric:
feedback.user_thumbs(mean) - Chart type: Line chart
- Goal line: 0.80 (80% positive = healthy)
- Time range: Rolling 7 days
Add an alert:
Name: Feedback Score < 70%
Metric: feedback.user_thumbs (mean)
Threshold: < 0.70
Window: 60 minutes (minimum sample: 20 runs)
Severity: High
Verification
Trigger a test run and confirm traces appear with your tags:
from langsmith import Client
import time
client = Client()
# Run your tagged function
result = answer_question("What is RAG?")
# Wait for trace to index
time.sleep(3)
# Verify tags are present on the latest run
runs = list(client.list_runs(
project_name="your-project",
filter='has(tags, "env:production")',
limit=1
))
assert runs, "No production-tagged runs found — check your @traceable decorator"
print(f"Latest run: {runs[0].id} | Latency: {runs[0].end_time - runs[0].start_time}")
You should see: The run ID printed with a latency value under 10 seconds for a basic QA call.
Check the dashboard: navigate to Production Overview and confirm the panels populate within 2 minutes of your test run.
To fire a test alert, temporarily lower a threshold to a value your app will exceed, trigger a run, restore the threshold after the Slack message arrives.
What You Learned
- Traces without metadata tags are nearly useless for production dashboards — add
env:andfeature:tags at the call site - The four essential production metrics are latency percentiles, token cost, error rate, and tokens-per-run
- LangSmith alert rules use a metric + threshold + window model — shorter windows (5–15 min) for latency, longer windows (60 min) for cost
- User feedback scores are the only leading indicator for quality regressions that don't show up in infrastructure metrics
Limitation: LangSmith alert evaluation runs on 1-minute polling intervals. For sub-minute latency SLAs, you'll need an external APM tool (Datadog, Grafana) consuming LangSmith's webhook stream.
Tested on LangSmith SDK 0.3.x, Python 3.12, LangChain 0.3.x