← back to docs

architecture

Cloudflare-only data path. Ingest worker, queue, consumer, cron, dashboard.

[claude code on dev machine]
        │  OTLP/HTTP JSON + Authorization: Bearer

otlp.metric-ai.nativekloud.com  ── Worker: ingest (Hono)
        │  validate token (SHA-256 → KV) → push raw to Queue

[CF Queue: metric-ai-otlp-ingest]   (DLQ: -dlq, max_retries=3)


Worker: queue consumer
  ├─ decode OTLP/JSON traces → thin span rows → D1.spans
  ├─ decode OTLP/JSON metrics → 1-min pre-agg → WAE.metric_ai_metrics
  └─ decode OTLP/JSON logs → event rows → D1.events

[CF Cron: every 15 min]
  └─ WAE SQL (today + yesterday UTC) → upsert D1.rollups_daily

app.metric-ai.nativekloud.com  ── Worker: /api/* + Wrangler [assets] SPA
  ├─ Cf-Access JWT verified against ACCESS_AUD on every /api/* request
  ├─ /api/me, /api/rollups, /api/spans/tree
  └─ /api/cost-trend, /api/cache-hit, /api/tool-decisions,
     /api/active-users, /api/top-prompts, /api/subagent-stats

ingest worker

A single Hono Worker behind otlp.metric-ai.nativekloud.com. It does the cheapest possible job: bearer validation against a KV-stored SHA-256 hash, then push to a Cloudflare Queue. No decoding, no DB writes, no analytics calls. This keeps the hot path under 10ms and lets the queue absorb burst traffic during long agent sessions.

queue consumer

A second Worker bound to the queue. For each batch it decodes OTLP/JSON, splits by signal type, and fans out:

  • Traces become thin rows in D1.spans — trace id, span id, parent, name, duration, plus a handful of indexed attributes (prompt.id, user, repo, model, tool, cost). Everything else is discarded.
  • Metrics become Workers Analytics Engine writes, pre-aggregated to 1-minute buckets keyed by (org, user, repo, model, tool, status).
  • Logs become event rows in D1.events. Bodies are not stored.

A dead-letter queue catches malformed payloads after three retries.

cron rollup worker

Every 15 minutes a Cron Worker runs WAE SQL queries against today + yesterday UTC and upserts the result into D1.rollups_daily. The dashboard reads from this table — never from WAE directly — so panel queries stay sub-100ms.

dashboard worker

app.metric-ai.nativekloud.com serves a React SPA from a Wrangler [assets] binding plus an /api/* Hono router. Every API request validates a Cloudflare Access JWT against the pinned AUD via JWKS. No Gatehouse, no homegrown auth — Access does the heavy lifting and we just verify the signature.

resources

ResourceBindingPurpose
D1 metric-ai-dbDBspans, events, rollups, prompt_summary
KV TOKENSTOKENSSHA-256 token → org_id lookup
Queue metric-ai-otlp-ingestINGEST_QUEUEingest → consumer fan-out
Analytics Engine metric_ai_metricsWAE1-min pre-aggregated metrics
Static assetsASSETSdashboard SPA
Cron */15 * * * *rollups

why this shape

  • Pre-aggregate at ingest. A 12-hour session is ~50k spans; we collapse it to ~200 WAE rows. Storage cost stays linear in dev-count, not span-count.
  • D1 only for thin metadata. The subagent-tree viewer needs span lineage; we don’t need the full attribute bag.
  • Daily roll-ups for dashboard reads. Panel queries hit a small table, not a span firehose.
  • Cloudflare-only. No cross-cloud egress, no third-party SaaS. EU residency is a region flip, not a re-architecture.