Langfuse vs LangSmith: TL;DR
| Langfuse | LangSmith | |
|---|---|---|
| Open-source | ✅ MIT license | ❌ Proprietary |
| Self-host | ✅ Docker / Kubernetes | ✅ Enterprise only |
| Cloud free tier | 50k observations/mo | 5k traces/mo |
| Cloud paid | From $59/mo | From $39/mo |
| LangChain native | Partial (callback) | ✅ Deep integration |
| Evals | ✅ Built-in + custom | ✅ Built-in + custom |
| Prompt management | ✅ Yes | ✅ Yes |
| SDKs | Python, JS/TS, REST | Python, JS/TS |
| Best for | Teams needing data control or multi-framework tracing | Teams deep in the LangChain/LangGraph ecosystem |
Choose Langfuse if: you need full data ownership, self-hosting, or you're using non-LangChain frameworks (OpenAI SDK, LlamaIndex, custom pipelines).
Choose LangSmith if: your stack is built on LangChain or LangGraph and you want zero-config observability out of the box.
What We're Comparing
LLM applications fail in ways that are invisible without tracing. A chain returns the wrong answer, an agent loops, a prompt regression slips into production — you can't debug any of this with print() statements. Langfuse and LangSmith are the two dominant tools for instrumenting, evaluating, and monitoring LLM apps in 2026. They overlap significantly but make different architectural bets.
Langfuse Overview
Langfuse is an open-source LLM engineering platform built around a traces → spans → scores data model. You instrument your code with the SDK, every LLM call and chain step becomes a span inside a trace, and you score outputs manually or with automated evaluators.
The key differentiator: Langfuse is MIT-licensed. You can self-host the entire stack (Postgres + ClickHouse + the web app) on your own infrastructure and own every byte of your data. The cloud version is fully managed but the codebase is identical.
Pros:
- Full self-host with Docker Compose in under 10 minutes
- Framework-agnostic — works with OpenAI SDK, Anthropic SDK, LlamaIndex, LangChain, custom code
- Strong dataset and eval workflow built directly into the UI
- Active open-source community (20k+ GitHub stars as of early 2026)
Cons:
- Less native integration with LangChain than LangSmith — requires manual callback setup
- UI is functional but less polished than LangSmith's
- ClickHouse dependency adds infra complexity for self-hosters
LangSmith Overview
LangSmith is LangChain's proprietary observability product. It was purpose-built to trace LangChain and LangGraph runs, and that tight coupling shows: set two environment variables and every chain, agent, and tool call is traced automatically with no other code changes.
Beyond tracing, LangSmith has matured into a full evaluation platform with datasets, human annotation queues, and an online evaluation runner that scores production traces as they arrive.
Pros:
- Zero-config tracing for any LangChain or LangGraph application
@traceabledecorator instruments arbitrary Python functions in one line- Polished UI with annotation queues well-suited for team labeling workflows
- Built-in integration with LangChain Hub for prompt versioning
Cons:
- Proprietary — no self-host option outside of expensive enterprise contracts
- All your trace data lives on LangChain Inc. servers
- Free tier is limited to 5k traces/month, restrictive for development
- Tightly coupled to the LangChain ecosystem; less useful if you're migrating away
Head-to-Head
Self-Hosting and Data Control
This is where the tools diverge most sharply. Langfuse self-hosting is genuinely accessible:
# Clone and start Langfuse locally
git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker compose up -d
The stack is up at http://localhost:3000 in about 2 minutes. Production deployment on Kubernetes uses the official Helm chart and requires Postgres + ClickHouse.
LangSmith offers no self-hosted option outside enterprise licensing (which starts at custom pricing). If your data governance policy requires on-premise storage — HIPAA, SOC 2, internal policy — Langfuse is the only practical choice of the two.
Instrumentation
LangSmith with LangChain requires just two environment variables:
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=your_key
Every subsequent LangChain or LangGraph call is traced. No other changes.
Langfuse with LangChain requires the callback handler:
from langfuse.callback import CallbackHandler
handler = CallbackHandler(
public_key="pk-...",
secret_key="sk-...",
host="https://cloud.langfuse.com" # or your self-hosted URL
)
# Pass to any LangChain runnable
chain.invoke({"input": "hello"}, config={"callbacks": [handler]})
For non-LangChain code, Langfuse's @observe decorator is cleaner than anything LangSmith offers for custom instrumentation:
from langfuse.decorators import observe, langfuse_context
@observe()
def call_llm(prompt: str) -> str:
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
# Score this generation inline
langfuse_context.score_current_observation(
name="relevance",
value=0.9
)
return response.choices[0].message.content
LangSmith's @traceable is equivalent for Python but doesn't have the same inline scoring ergonomics.
Evaluation Workflows
Both tools support the same core eval pattern: create a dataset of input/output pairs, run an evaluator (LLM-as-judge or custom function), track scores over time.
LangSmith's eval runner integrates tightly with LangChain's evaluate() function:
from langsmith.evaluation import evaluate
results = evaluate(
lambda inputs: chain.invoke(inputs),
data="my-dataset",
evaluators=[correctness_evaluator],
experiment_prefix="gpt-4o-baseline"
)
Langfuse's eval approach is more UI-driven — you define evaluators in the dashboard and they run against traces automatically. It also supports Ragas and other open-source eval libraries via the SDK:
from langfuse import Langfuse
lf = Langfuse()
lf.score(
trace_id=trace_id,
name="ragas_faithfulness",
value=faithfulness_score
)
Neither tool has a decisive edge here. LangSmith's programmatic runner is better for CI/CD eval pipelines. Langfuse's dashboard-driven approach is more accessible for non-engineers reviewing outputs.
Pricing at Scale
At 500k traces/month (a moderate production workload):
- Langfuse Cloud: ~$119/mo (Hobby+ plan with higher limits)
- LangSmith Cloud: ~$150–200/mo depending on seat count
- Langfuse self-hosted: Infrastructure cost only — typically $30–80/mo on AWS/GCP for a small deployment
For startups or cost-sensitive teams, self-hosted Langfuse is the cheapest option that doesn't compromise on features.
Which Should You Use?
Pick Langfuse when:
- You need self-hosting for compliance, data residency, or cost reasons
- Your stack mixes frameworks (OpenAI + Anthropic + custom retrieval)
- You're building a product and want observability that isn't tied to a vendor
- You want to contribute to or extend an open-source tool
Pick LangSmith when:
- Your entire stack is LangChain or LangGraph and you want zero-config tracing
- You need team annotation workflows with a polished UI today
- You're already paying for LangChain's ecosystem and want tight integration
- Setup time is the priority over long-term data ownership
Use both when: you're evaluating which to standardize on — both have free tiers sufficient for a two-week pilot on real traffic.
FAQ
Q: Can Langfuse trace LangGraph agents?
A: Yes. Use the CallbackHandler exactly as you would for LangChain — LangGraph inherits the same callback system. Each node in the graph appears as a separate span in the trace.
Q: Does LangSmith work with non-LangChain code?
A: Yes, via the @traceable decorator and the RunTree API. It works fine, but the DX advantage over Langfuse disappears — at that point you're comparing two SDKs on roughly equal footing.
Q: Can I migrate from one to the other?
A: There's no automated migration path for historical trace data. Both tools export datasets (input/output pairs) as JSON or CSV, so you can port your eval datasets. Re-instrumenting your code takes 1–2 hours for a typical app.
Q: Is Langfuse actually maintained? Open-source tools often go stale.
A: As of early 2026, Langfuse is one of the most actively maintained LLM dev tools — multiple commits per week, 20k+ GitHub stars, and a Series A-backed company behind it. Check the Langfuse GitHub for current activity before deciding.