Decision Trace — Full Routing Pipeline Visibility per Request

2026-03-20

What We Built

Decision trace capture is now wired into the router pipeline, recording the full sequence of routing decisions for every request: budget check, auto-selection, Thompson sampling scores, circuit breaker states, cache lookup result, cost-frontier optimization, and quality scoring. A new GET /auth/logs/:requestId/trace endpoint exposes this data to the dashboard and SDKs.

The dashboard logs page gained a side panel with decision trace visualization, a request timeline showing per-phase latency, shareable URLs for individual requests, and real-time SSE streaming for live log tailing. Clicking any log row opens the trace panel showing exactly why BrainstormRouter chose the model it did.

Both the TypeScript and Python SDKs ship logs.getTrace() / logs.get_trace() methods that return the full DecisionTrace object or { available: false } for requests made before trace capture was enabled.

Why It Matters

When an AI agent gets a slow or expensive response, operators need to understand _why_ that model was selected. Was it Thompson sampling? Budget degradation? A circuit breaker forcing a fallback? Decision trace answers these questions in one API call, replacing guesswork with data. This is the observability gap that Portkey and OpenRouter leave open — they show you _what_ happened but not _why_.

For sales, decision trace is a live demo: route a request, click the log entry, and watch the intelligence pipeline unfold stage by stage. It turns BrainstormRouter's invisible intelligence into a visible, auditable story.

How It Works

The router captures a DecisionTrace object at each stage of the hot path. The trace is attached to the request context and persisted alongside the usage log. The dashboard fetches it via the logs endpoint and renders a timeline visualization.

// SDK usage
const trace = await client.logs.getTrace("req_abc123");
console.log(`Routed via ${trace.routingMethod} in ${trace.timing.totalMs}ms`);
console.log(`Cache: ${trace.cacheResult}`);
console.log(`Stages: ${trace.routingStages.map((s) => s.stage).join(" -> ")}`);

The trace schema is versioned (version: 1) for backward compatibility. Fields like banditCandidates, cascadeStep, and costEstimate are only present when the corresponding intelligence system participated in routing.

The Numbers

Trace capture overhead: <0.5ms per request (struct assignment, no serialization on hot path)
Trace payload size: 200-800 bytes typical (JSON, uncompressed)
Dashboard side panel render: <50ms from click to full trace display
SSE streaming latency: <100ms from log persistence to dashboard update

Competitive Edge

No competing gateway exposes per-request routing rationale. Portkey shows cost and latency but not _why_ a model was selected. OpenRouter has no observability layer at all. BrainstormRouter's decision trace turns the routing pipeline into an auditable, debuggable system — essential for enterprise compliance and for operators tuning their routing configuration.

Lockstep Checklist

[x] API Routes: GET /auth/logs/:requestId/trace added in src/api/routes/logs.ts
[x] TS SDK: logs.getTrace() added in packages/sdk-ts/src/resources/logs.ts
[x] Python SDK: logs.get_trace() (sync + async) added in packages/sdk-py/src/brainstormrouter/resources/logs.py
[x] MCP Schemas: N/A (dashboard endpoint, not agent-facing)
[x] Docs: docs/reference/decision-trace.md created, added to docs/docs.json navigation
[x] Dashboard: Side panel with trace visualization, timeline, shareable URLs, SSE streaming
[x] Master Record: New endpoint auto-included in llms-full.txt via build