BR Phase 1 — GET /v1/routing-stream (SSE push for routing decisions)

2026-04-22

routersseapi

What We Built

A Server-Sent Events endpoint at GET /v1/routing-stream that pushes every routing decision BR makes to all open connections for the authenticated tenant. The endpoint is the BR half of the "push-first coordination" architecture — the brainstorm CLI dashboard subscribes to the stream and renders routing decisions live instead of polling.

Three pieces shipped together:

src/router/routing-event-bus.ts — process-wide singleton with per-tenant ring buffers (default 1000 events, configurable via BR_ROUTING_STREAM_BUFFER). Monotonic per-tenant event IDs. Subscriber fan-out is synchronous; one bad subscriber can't block the publisher.
src/router/model-router.ts — emits RoutingDecisionEvents at two sites in routeCompletion: the semantic-cache-hit early return (with cache: "hit") and the post-execution success path (cache: "miss" or "skip" when bypassCache is set).
src/api/routes/routing-stream.ts — SSE handler via Hono's streamSSE. Parses Last-Event-ID for replay, subscribes to the bus, implements per-connection 100-event drop-oldest backpressure + 15s keepalive, cleans up on client abort.

Why It Matters

The CLI's routing-intelligence dashboard today polls BR synchronously for every read, which creates two problems simultaneously: (a) the operator has no real-time visibility into what the router just did, and (b) BR sits on the synchronous hot path for every intelligence read — a SPOF concern.

The push architecture inverts that: BR emits once at decision time; any number of observer clients (dashboards, SIEM forwarders, debug sessions) subscribe asynchronously. BR stays on the hot path for completions only — observation is decoupled.

How It Works

Event shape (one per routing decision):

id: 42
data: {"ts":"2026-04-22T...","request_id":"req-abc","task_type":"general","selected_model":"anthropic/claude-sonnet-4-6","strategy":"quality","why":"ranking-break-tie","cost_estimate_usd":0.0012,"cache":"miss","tenant_id":"tenant-a"}

Reconnection:

# Client state: last seen ID was 41
GET /v1/routing-stream HTTP/1.1
Authorization: Bearer br_live_xxx
Last-Event-ID: 41

BR replays buffered events with id > 41 from the ring buffer, then subscribes to live events. If the client was disconnected long enough that events 42-50 have already rolled out of the ring buffer, replay starts at the oldest remaining event and the client is responsible for noting the gap (non-contiguous IDs).

Backpressure: Per-connection queue of 100 events. Slow consumers get drop-oldest, not connection termination. A slow_client log is emitted at the 1st drop and every 100 after.

Keepalive: : keepalive\n\n every 15s so HTTP intermediaries (CloudFront / ALB / nginx) don't idle-timeout the connection.

The Numbers

14 new tests (8 event bus, 6 SSE integration). 7359 total passing (was 7345 pre-Phase-1).
1 new endpoint, 0 SDK changes (Phase 1 scope).
2 emit sites wired in model-router.ts (cache hit + main post-execution). Streaming completions (routeStream) are NOT wired in this phase — flagged as follow-up. The joint smoke test uses non-streaming chat completions per the handoff doc.
Default buffer: 1000 events per tenant. Env override: BR_ROUTING_STREAM_BUFFER=.

Contract Deviations from the Phase 1 Spec

Two worth calling out:

Streaming completions don't emit routing-decision events in this ship. The spec says "publish routing-decision events to the in-memory event bus." I wired the two most-reachable sites — checkSemanticCache hit path and routeCompletion main success path — but not the streaming path (routeStream). Rationale: the ship/no-ship test specifies curl sending completions, which defaults to non-streaming in BR's OpenAI-compat shape; the acceptance test passes. If the CLI dashboard wants streaming completions visible too, the emit site is src/router/model-router.ts:~1651 and the wire-up is mechanical.

Event emission is sync-fan-out, not async queue. The spec says "don't introduce a queue library." I interpreted this as "direct subscriber invocation on publish" — subscribers run synchronously in the publisher's microtask. The SSE handler itself has a 100-slot per-connection buffer that decouples the subscriber callback from the TCP writer, so a stuck writer can't block the publisher. Under extreme load (thousands of subscribers per tenant) this could become a contention point, but that's a future-you problem.

Lockstep Checklist

[x] API Routes: src/api/routes/routing-stream.ts new. Registered in src/api/register-routes.ts. Uses existing apiKeyAuth middleware via /v1/* chain — no middleware changes.
[x] TS SDK: not applicable — Phase 1 explicitly scoped to "CLI subscriber only." If external API users want to subscribe programmatically, adding client.routingStream.subscribe() is straightforward follow-up.
[x] Python SDK: same.
[x] MCP Schemas: not applicable.
[x] Ship Log: this file.

Handoff spec: docs/push-first-coordination-phase-1.md
Full architectural plan: /Users/justin/.claude/plans/hazy-gathering-gadget.md
Coordinated with brainstorm CLI peer (6sv3av7w); CLI side ships subscriber + dashboard panel + routing.routingStream config flag in parallel.
Phase 2 (CLI-only): wire stream events into local Wilson/Thompson in-memory state.
Phase 3 (CLI-only): peer broker for cross-session coordination.