← back to docs

comparison

Why "Claude-stack-only" is a feature. Metric AI vs Datadog/Honeycomb/Grafana, vs Langfuse/Helicone, vs Anthropic's own dashboards.

tl;dr

Metric AI is the OpenTelemetry backend purpose-built for the Claude stack — Claude Code, Claude Agent SDK, and Claude Cowork. We accept generic OTLP, but every dashboard, every aggregate, every renderer is shaped exactly like the data Anthropic’s tools emit. That’s the wedge.

If you want one backend for your Postgres, your Kubernetes nodes, and your Claude usage, you want Datadog or Grafana. If you want the Claude-stack story rendered correctly — subagent trees, prompt-id correlation, Cowork approval flows — you want us.

why “Claude-stack-only” is a feature, not a limitation

Generic OTel backends optimise for breadth. They render every span the same way: a name, a duration, a parent id, an attribute bag. That’s the right shape for a microservice mesh. It’s the wrong shape for an agent loop.

A Claude Code session is not a flat list of spans. It’s a tree:

  • one claude_code.interaction per user turn
  • N claude_code.llm_request children, with model + token + cost attributes
  • M claude_code.tool children, some tool.blocked_on_user, some tool.execution
  • correlated by prompt.id across the whole tree, including Agent SDK subprocesses (TRACEPARENT propagated automatically)
  • and — if you use Cowork — interleaved approval, file-access, skills/plugins events

Modeling this generically means every panel reduces to “list of spans with these attributes.” Modeling it specifically means we can show “for prompt id X, here is the tree, here are the tool decisions, here is the cost broken out by model, and here is the approval flow that gated it.”

That’s what we do. Generic backends won’t, and shouldn’t — it’s not their job.

vs generic OTel backends

VendorStrengthsWhy it’s the wrong shape for the Claude stack
DatadogMature, complete signal coverage, alerting, RUMPricing scales with custom-metric cardinality. A 200-dev shop on Claude Code easily exceeds the budget. No Claude-aware dashboards. Cowork approval events arrive as opaque log lines.
HoneycombBest-in-class trace exploration, BubbleUpSame cardinality cost shape. Wide-event model is great for backend services, awkward for prompt.id-scoped agent trees.
Grafana CloudShips an official Claude Code integration with basic dashboards. Best of the generics.Still pays per active series. No Cowork approval rendering. EU residency requires the higher tier.
New Relic / Elastic / SigNoz / Sealos / QuesmaSolid OTel support, self-host optionsAll published Claude Code dashboards in early 2026. None specialise on cost attribution per dev/repo. Agent SDK traces fragment across subprocess boundaries.

The common pattern: ingest fine, render generically, bill by span volume. None of them know what a prompt.id is supposed to mean across Claude Code + Agent SDK + Cowork.

vs LLM-native tools

VendorStrengthsWhy it’s the wrong shape for the Claude stack
LangfuseOpen-source, popular for prompt experiments, has a Claude Agent SDK guideTreats traces generically — single LLM call per row. No subagent tree. Cowork’s tool/file/approval events are out of scope.
HeliconeCheap proxy-based model, clean UISits in the API path, not the OTLP path. Doesn’t see claude_code.tool decisions, hooks, or Cowork’s approval flow.
LangSmith / Phoenix / Traceloop / Braintrust / WeaveEach has a niche (evals, dataset curation, framework tracing)All are LLM-API-call-shaped. Agent loops are second-class at best. Cowork is unsupported.

These tools are great if your unit of analysis is “one LLM call.” The Claude stack’s unit of analysis is a prompt.id — one user turn that fans out into dozens of tool calls, sub-agents, hooks, and (with Cowork) human approvals.

vs Anthropic’s own dashboards

Anthropic ships first-class OTel instrumentation across Claude Code, Agent SDK, and Cowork. They explicitly do not ship a hosted backend — the docs say “bring your own”. Metric AI is that backend. If Anthropic ever flips on a hosted dashboard toggle for the SMB segment, our wedge becomes price + EU residency + cross-tool correlation (Claude Code plus Cowork in the same view).

You probably already run Datadog or Grafana for your services. Keep doing that. Point Claude Code, Agent SDK, and Cowork at us via standard OTLP env vars. The two backends don’t compete — they cover different parts of your telemetry.

[your microservices, databases, K8s] ────────► Datadog / Honeycomb / Grafana
[Claude Code + Agent SDK + Cowork]   ────────► Metric AI

Both use OTLP. Both honour W3C trace context. Different unit of analysis, different bill, different rendering.

price comparison

For a 100-developer shop running Claude Code at moderate volume:

TierBackendApprox monthly
Generic OTel (Datadog)full retention, all spans$4–10k
Generic OTel (Grafana Cloud)basic Claude Code integration$1.5–3k
LLM-native (Langfuse cloud)per-event billing$500–1.5k (no subagent trees, no Cowork)
Metric AI Metrics tier$3 / dev / mo$300
Metric AI Traces tier$8 / dev / mo$800

Numbers are illustrative — every shop’s volume is different — but the ratio holds.

what we won’t beat

If you need Datadog’s RUM, log correlation, infra metrics, and APM in one pane of glass: keep Datadog. We are not trying to replace it. We are the focused observability layer for the Claude stack — not the SRE platform.