LangSmith Production Monitoring: Alerts and Dashboards 2026

Set up LangSmith alerts and dashboards for LLM apps in production. Monitor latency, cost, errors, and token usage with real working config.

Problem: Your LLM App Is in Production and You're Flying Blind

You deployed your LLM app. Latency spikes. Costs creep up. A prompt change silently breaks outputs. You find out from a user complaint — not an alert.

LangSmith has production monitoring built in, but the defaults won't save you. You need custom dashboards tracking the metrics that matter and alerts that fire before users notice.

You'll learn:

  • How to configure LangSmith dashboards for latency, cost, error rate, and token usage
  • How to set up threshold alerts that notify via Slack or webhook
  • How to tag traces for filtering so dashboards actually reflect your production workloads

Time: 25 min | Difficulty: Intermediate


Why Default LangSmith Tracing Isn't Enough

LangSmith traces every LLM call automatically once you add the SDK. That's useful for debugging. But raw traces don't tell you:

  • Whether p95 latency crossed your SLA threshold at 2am
  • Which chain is responsible for 40% of your monthly token spend
  • When error rate on a specific prompt template jumped from 1% to 12%

Dashboards and alerts convert raw traces into signals. Here's how to build them properly.


Solution

Step 1: Instrument Your Code with Metadata Tags

Before dashboards mean anything, your traces need consistent metadata. Add tags and metadata to every run so you can filter by feature, model, environment, and user segment.

from langsmith import traceable

# ✅ Tag traces so dashboard filters work in production
@traceable(
    name="answer-question",
    tags=["feature:qa", "env:production"],
    metadata={"model": "gpt-4o", "version": "v2.1", "user_tier": "pro"}
)
def answer_question(question: str) -> str:
    # your chain logic here
    return response

For LangChain chains, pass metadata at invocation time:

chain.invoke(
    {"question": user_question},
    config={
        "tags": ["feature:qa", "env:production"],
        "metadata": {"user_id": user_id, "session_id": session_id}
    }
)

Rule: Every production invocation needs at minimum env:production and a feature: tag. Without this, your dashboards mix dev noise into production metrics.


Step 2: Create a Production Dashboard

Go to LangSmith → Projects → [your project] → Dashboards → New Dashboard.

Build four panels — these cover 90% of what production LLM apps need:

Panel 1 — Latency Percentiles

  • Chart type: Line chart
  • Metric: run_latency_ms
  • Aggregation: p50, p95, p99
  • Filter: tags contains env:production
  • Group by: name (chain name)
  • Time range: Rolling 24h

Panel 2 — Token Cost Over Time

  • Chart type: Area chart
  • Metric: total_cost
  • Aggregation: Sum
  • Filter: tags contains env:production
  • Group by: metadata.model
  • Time range: Rolling 7 days

Panel 3 — Error Rate

  • Chart type: Bar chart
  • Metric: error_rate
  • Aggregation: Count (errors) / Count (total)
  • Filter: tags contains env:production
  • Group by: name
  • Time range: Rolling 1h

Panel 4 — Token Usage by Feature

  • Chart type: Stacked bar
  • Metric: total_tokens
  • Aggregation: Sum
  • Filter: tags contains env:production
  • Group by: metadata.feature (from your tags)
  • Time range: Rolling 24h

Save the dashboard as "Production Overview" and pin it to the project.


Step 3: Configure Threshold Alerts

Navigate to LangSmith → Projects → [your project] → Alerts → New Alert Rule.

Set up these four rules. Each targets a different failure mode.

Alert 1 — Latency SLA Breach

Name:        P95 Latency > 5s
Metric:      run_latency_ms (p95)
Threshold:   > 5000
Window:      5 minutes
Filter:      tags contains env:production
Severity:    Critical

Alert 2 — Error Rate Spike

Name:        Error Rate > 5%
Metric:      error_rate
Threshold:   > 0.05
Window:      10 minutes
Filter:      tags contains env:production
Severity:    High

Alert 3 — Cost Anomaly

Name:        Hourly Cost > $10
Metric:      total_cost (sum)
Threshold:   > 10.00
Window:      60 minutes
Filter:      tags contains env:production
Severity:    Medium

Alert 4 — Token Budget

Name:        Avg Tokens per Run > 4000
Metric:      total_tokens (mean)
Threshold:   > 4000
Window:      15 minutes
Filter:      tags contains env:production AND name = "answer-question"
Severity:    Low

Step 4: Route Alerts to Slack

LangSmith alert rules support webhook delivery. Create a Slack incoming webhook, then attach it to each alert rule.

# LangSmith doesn't have a Python SDK for alert config yet — use the REST API
import httpx

LANGSMITH_API_KEY = "ls__your_key_here"
PROJECT_ID = "your-project-id"

webhook_payload = {
    "name": "slack-production-alerts",
    "type": "webhook",
    "config": {
        "url": "https://hooks.slack.com/services/T00/B00/your-webhook-token",
        "headers": {"Content-Type": "application/json"},
        # LangSmith sends { "alert_name", "metric_value", "threshold", "run_url" }
    }
}

response = httpx.post(
    f"https://api.smith.langchain.com/api/v1/projects/{PROJECT_ID}/alert-channels",
    headers={"x-api-key": LANGSMITH_API_KEY},
    json=webhook_payload,
)
print(response.json())  # {"id": "channel-uuid", "name": "slack-production-alerts"}

Once the channel is created, assign it to each alert rule in the UI under Alert Rules → Edit → Notification Channels.

Expected Slack message format:

🚨 [Critical] P95 Latency > 5s
Project: my-llm-app
Value: 7,340ms (threshold: 5,000ms)
Window: 2026-03-09 14:05 → 14:10 UTC
View runs: https://smith.langchain.com/...

Step 5: Add a Feedback Signal to Traces

Alerts on latency and cost catch infrastructure problems. They won't catch silent quality regressions — when outputs become subtly worse.

Wire up user feedback so you can track thumbs-up rate in dashboards:

from langsmith import Client

client = Client()

def record_user_feedback(run_id: str, score: int, comment: str = ""):
    # score: 1 = positive, 0 = negative
    client.create_feedback(
        run_id=run_id,
        key="user_thumbs",
        score=score,
        comment=comment,
        source_info={"source": "webapp", "version": "v2.1"}
    )

Call this from your API endpoint when the user rates a response:

@app.post("/feedback")
async def submit_feedback(run_id: str, rating: int):
    record_user_feedback(run_id=run_id, score=rating)
    return {"status": "ok"}

Add a Panel 5 — Feedback Score to your dashboard:

  • Metric: feedback.user_thumbs (mean)
  • Chart type: Line chart
  • Goal line: 0.80 (80% positive = healthy)
  • Time range: Rolling 7 days

Add an alert:

Name:        Feedback Score < 70%
Metric:      feedback.user_thumbs (mean)
Threshold:   < 0.70
Window:      60 minutes (minimum sample: 20 runs)
Severity:    High

Verification

Trigger a test run and confirm traces appear with your tags:

from langsmith import Client
import time

client = Client()

# Run your tagged function
result = answer_question("What is RAG?")

# Wait for trace to index
time.sleep(3)

# Verify tags are present on the latest run
runs = list(client.list_runs(
    project_name="your-project",
    filter='has(tags, "env:production")',
    limit=1
))

assert runs, "No production-tagged runs found — check your @traceable decorator"
print(f"Latest run: {runs[0].id} | Latency: {runs[0].end_time - runs[0].start_time}")

You should see: The run ID printed with a latency value under 10 seconds for a basic QA call.

Check the dashboard: navigate to Production Overview and confirm the panels populate within 2 minutes of your test run.

To fire a test alert, temporarily lower a threshold to a value your app will exceed, trigger a run, restore the threshold after the Slack message arrives.


What You Learned

  • Traces without metadata tags are nearly useless for production dashboards — add env: and feature: tags at the call site
  • The four essential production metrics are latency percentiles, token cost, error rate, and tokens-per-run
  • LangSmith alert rules use a metric + threshold + window model — shorter windows (5–15 min) for latency, longer windows (60 min) for cost
  • User feedback scores are the only leading indicator for quality regressions that don't show up in infrastructure metrics

Limitation: LangSmith alert evaluation runs on 1-minute polling intervals. For sub-minute latency SLAs, you'll need an external APM tool (Datadog, Grafana) consuming LangSmith's webhook stream.

Tested on LangSmith SDK 0.3.x, Python 3.12, LangChain 0.3.x