What is the difference between and ?

Langfuse vs LangSmith compared on open-source flexibility, tracing, evals, pricing, and self-hosting. Pick the right LLM observability tool.

Which is better: or ?

and each have distinct strengths. The best choice depends on your use case, team size, and technical requirements. Our in-depth comparison covers performance, pricing, features, and real-world use cases to help you decide.

offers both free and paid tiers. Our full comparison breaks down the pricing structure of and including free plan limitations, pro pricing, and enterprise options.

When should I use instead of ?

Choose when you need its specific strengths for your workflow, and consider when its feature set better matches your requirements. Read the full comparison for detailed use-case recommendations.

Langfuse vs LangSmith: LLM Observability Compared 2026

Langfuse vs LangSmith: TL;DR

	Langfuse	LangSmith
Open-source	✅ MIT license	❌ Proprietary
Self-host	✅ Docker / Kubernetes	✅ Enterprise only
Cloud free tier	50k observations/mo	5k traces/mo
Cloud paid	From $59/mo	From $39/mo
LangChain native	Partial (callback)	✅ Deep integration
Evals	✅ Built-in + custom	✅ Built-in + custom
Prompt management	✅ Yes	✅ Yes
SDKs	Python, JS/TS, REST	Python, JS/TS
Best for	Teams needing data control or multi-framework tracing	Teams deep in the LangChain/LangGraph ecosystem

Choose Langfuse if: you need full data ownership, self-hosting, or you're using non-LangChain frameworks (OpenAI SDK, LlamaIndex, custom pipelines).
Choose LangSmith if: your stack is built on LangChain or LangGraph and you want zero-config observability out of the box.

What We're Comparing

LLM applications fail in ways that are invisible without tracing. A chain returns the wrong answer, an agent loops, a prompt regression slips into production — you can't debug any of this with print() statements. Langfuse and LangSmith are the two dominant tools for instrumenting, evaluating, and monitoring LLM apps in 2026. They overlap significantly but make different architectural bets.

Langfuse Overview

Langfuse is an open-source LLM engineering platform built around a traces → spans → scores data model. You instrument your code with the SDK, every LLM call and chain step becomes a span inside a trace, and you score outputs manually or with automated evaluators.

The key differentiator: Langfuse is MIT-licensed. You can self-host the entire stack (Postgres + ClickHouse + the web app) on your own infrastructure and own every byte of your data. The cloud version is fully managed but the codebase is identical.

Pros:

Full self-host with Docker Compose in under 10 minutes
Framework-agnostic — works with OpenAI SDK, Anthropic SDK, LlamaIndex, LangChain, custom code
Strong dataset and eval workflow built directly into the UI
Active open-source community (20k+ GitHub stars as of early 2026)

Cons:

Less native integration with LangChain than LangSmith — requires manual callback setup
UI is functional but less polished than LangSmith's
ClickHouse dependency adds infra complexity for self-hosters

LangSmith Overview

LangSmith is LangChain's proprietary observability product. It was purpose-built to trace LangChain and LangGraph runs, and that tight coupling shows: set two environment variables and every chain, agent, and tool call is traced automatically with no other code changes.

Beyond tracing, LangSmith has matured into a full evaluation platform with datasets, human annotation queues, and an online evaluation runner that scores production traces as they arrive.

Pros:

Zero-config tracing for any LangChain or LangGraph application
@traceable decorator instruments arbitrary Python functions in one line
Polished UI with annotation queues well-suited for team labeling workflows
Built-in integration with LangChain Hub for prompt versioning

Cons:

Proprietary — no self-host option outside of expensive enterprise contracts
All your trace data lives on LangChain Inc. servers
Free tier is limited to 5k traces/month, restrictive for development
Tightly coupled to the LangChain ecosystem; less useful if you're migrating away

Head-to-Head

Self-Hosting and Data Control

This is where the tools diverge most sharply. Langfuse self-hosting is genuinely accessible:

# Clone and start Langfuse locally
git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker compose up -d

The stack is up at http://localhost:3000 in about 2 minutes. Production deployment on Kubernetes uses the official Helm chart and requires Postgres + ClickHouse.

LangSmith offers no self-hosted option outside enterprise licensing (which starts at custom pricing). If your data governance policy requires on-premise storage — HIPAA, SOC 2, internal policy — Langfuse is the only practical choice of the two.

Instrumentation

LangSmith with LangChain requires just two environment variables:

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=your_key

Every subsequent LangChain or LangGraph call is traced. No other changes.

Langfuse with LangChain requires the callback handler:

from langfuse.callback import CallbackHandler

handler = CallbackHandler(
    public_key="pk-...",
    secret_key="sk-...",
    host="https://cloud.langfuse.com"  # or your self-hosted URL
)

# Pass to any LangChain runnable
chain.invoke({"input": "hello"}, config={"callbacks": [handler]})

For non-LangChain code, Langfuse's @observe decorator is cleaner than anything LangSmith offers for custom instrumentation:

from langfuse.decorators import observe, langfuse_context

@observe()
def call_llm(prompt: str) -> str:
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    # Score this generation inline
    langfuse_context.score_current_observation(
        name="relevance",
        value=0.9
    )
    return response.choices[0].message.content

LangSmith's @traceable is equivalent for Python but doesn't have the same inline scoring ergonomics.

Evaluation Workflows

Both tools support the same core eval pattern: create a dataset of input/output pairs, run an evaluator (LLM-as-judge or custom function), track scores over time.

LangSmith's eval runner integrates tightly with LangChain's evaluate() function:

from langsmith.evaluation import evaluate

results = evaluate(
    lambda inputs: chain.invoke(inputs),
    data="my-dataset",
    evaluators=[correctness_evaluator],
    experiment_prefix="gpt-4o-baseline"
)

Langfuse's eval approach is more UI-driven — you define evaluators in the dashboard and they run against traces automatically. It also supports Ragas and other open-source eval libraries via the SDK:

from langfuse import Langfuse

lf = Langfuse()
lf.score(
    trace_id=trace_id,
    name="ragas_faithfulness",
    value=faithfulness_score
)

Neither tool has a decisive edge here. LangSmith's programmatic runner is better for CI/CD eval pipelines. Langfuse's dashboard-driven approach is more accessible for non-engineers reviewing outputs.

Pricing at Scale

At 500k traces/month (a moderate production workload):

Langfuse Cloud: ~$119/mo (Hobby+ plan with higher limits)
LangSmith Cloud: ~$150–200/mo depending on seat count
Langfuse self-hosted: Infrastructure cost only — typically $30–80/mo on AWS/GCP for a small deployment

For startups or cost-sensitive teams, self-hosted Langfuse is the cheapest option that doesn't compromise on features.

Which Should You Use?

Pick Langfuse when:

You need self-hosting for compliance, data residency, or cost reasons
Your stack mixes frameworks (OpenAI + Anthropic + custom retrieval)
You're building a product and want observability that isn't tied to a vendor
You want to contribute to or extend an open-source tool

Pick LangSmith when:

Your entire stack is LangChain or LangGraph and you want zero-config tracing
You need team annotation workflows with a polished UI today
You're already paying for LangChain's ecosystem and want tight integration
Setup time is the priority over long-term data ownership

Use both when: you're evaluating which to standardize on — both have free tiers sufficient for a two-week pilot on real traffic.

FAQ

Q: Can Langfuse trace LangGraph agents?
A: Yes. Use the CallbackHandler exactly as you would for LangChain — LangGraph inherits the same callback system. Each node in the graph appears as a separate span in the trace.

Q: Does LangSmith work with non-LangChain code?
A: Yes, via the @traceable decorator and the RunTree API. It works fine, but the DX advantage over Langfuse disappears — at that point you're comparing two SDKs on roughly equal footing.

Q: Can I migrate from one to the other?
A: There's no automated migration path for historical trace data. Both tools export datasets (input/output pairs) as JSON or CSV, so you can port your eval datasets. Re-instrumenting your code takes 1–2 hours for a typical app.

Q: Is Langfuse actually maintained? Open-source tools often go stale.
A: As of early 2026, Langfuse is one of the most actively maintained LLM dev tools — multiple commits per week, 20k+ GitHub stars, and a Series A-backed company behind it. Check the Langfuse GitHub for current activity before deciding.