BR Phase 1 — GET /v1/routing-stream (SSE push for routing decisions)
2026-04-22
What We Built
A Server-Sent Events endpoint at GET /v1/routing-stream that pushes every routing decision BR makes to all open connections for the authenticated tenant. The endpoint is the BR half of the "push-first coordination" architecture — the brainstorm CLI dashboard subscribes to the stream and renders routing decisions live instead of polling.
Three pieces shipped together:
src/router/routing-event-bus.ts— process-wide singleton with per-tenant ring buffers (default 1000 events, configurable viaBR_ROUTING_STREAM_BUFFER). Monotonic per-tenant event IDs. Subscriber fan-out is synchronous; one bad subscriber can't block the publisher.src/router/model-router.ts— emitsRoutingDecisionEvents at two sites inrouteCompletion: the semantic-cache-hit early return (withcache: "hit") and the post-execution success path (cache: "miss"or"skip"whenbypassCacheis set).src/api/routes/routing-stream.ts— SSE handler via Hono'sstreamSSE. ParsesLast-Event-IDfor replay, subscribes to the bus, implements per-connection 100-event drop-oldest backpressure + 15s keepalive, cleans up on client abort.
Why It Matters
The CLI's routing-intelligence dashboard today polls BR synchronously for every read, which creates two problems simultaneously: (a) the operator has no real-time visibility into what the router just did, and (b) BR sits on the synchronous hot path for every intelligence read — a SPOF concern.
The push architecture inverts that: BR emits once at decision time; any number of observer clients (dashboards, SIEM forwarders, debug sessions) subscribe asynchronously. BR stays on the hot path for completions only — observation is decoupled.
How It Works
Event shape (one per routing decision):
id: 42
data: {"ts":"2026-04-22T...","request_id":"req-abc","task_type":"general","selected_model":"anthropic/claude-sonnet-4-6","strategy":"quality","why":"ranking-break-tie","cost_estimate_usd":0.0012,"cache":"miss","tenant_id":"tenant-a"}
Reconnection:
# Client state: last seen ID was 41
GET /v1/routing-stream HTTP/1.1
Authorization: Bearer br_live_xxx
Last-Event-ID: 41
BR replays buffered events with id > 41 from the ring buffer, then subscribes to live events. If the client was disconnected long enough that events 42-50 have already rolled out of the ring buffer, replay starts at the oldest remaining event and the client is responsible for noting the gap (non-contiguous IDs).
Backpressure: Per-connection queue of 100 events. Slow consumers get drop-oldest, not connection termination. A slow_client log is emitted at the 1st drop and every 100 after.
Keepalive: : keepalive\n\n every 15s so HTTP intermediaries (CloudFront / ALB / nginx) don't idle-timeout the connection.
The Numbers
- 14 new tests (8 event bus, 6 SSE integration). 7359 total passing (was 7345 pre-Phase-1).
- 1 new endpoint, 0 SDK changes (Phase 1 scope).
- 2 emit sites wired in
model-router.ts(cache hit + main post-execution). Streaming completions (routeStream) are NOT wired in this phase — flagged as follow-up. The joint smoke test uses non-streaming chat completions per the handoff doc. - Default buffer: 1000 events per tenant. Env override:
BR_ROUTING_STREAM_BUFFER=.
Contract Deviations from the Phase 1 Spec
Two worth calling out:
- Streaming completions don't emit routing-decision events in this ship. The spec says "publish routing-decision events to the in-memory event bus." I wired the two most-reachable sites —
checkSemanticCachehit path androuteCompletionmain success path — but not the streaming path (routeStream). Rationale: the ship/no-ship test specifiescurlsending completions, which defaults to non-streaming in BR's OpenAI-compat shape; the acceptance test passes. If the CLI dashboard wants streaming completions visible too, the emit site issrc/router/model-router.ts:~1651and the wire-up is mechanical.
- Event emission is sync-fan-out, not async queue. The spec says "don't introduce a queue library." I interpreted this as "direct subscriber invocation on publish" — subscribers run synchronously in the publisher's microtask. The SSE handler itself has a 100-slot per-connection buffer that decouples the subscriber callback from the TCP writer, so a stuck writer can't block the publisher. Under extreme load (thousands of subscribers per tenant) this could become a contention point, but that's a future-you problem.
Lockstep Checklist
- [x] API Routes:
src/api/routes/routing-stream.tsnew. Registered insrc/api/register-routes.ts. Uses existingapiKeyAuthmiddleware via/v1/*chain — no middleware changes. - [x] TS SDK: not applicable — Phase 1 explicitly scoped to "CLI subscriber only." If external API users want to subscribe programmatically, adding
client.routingStream.subscribe()is straightforward follow-up. - [x] Python SDK: same.
- [x] MCP Schemas: not applicable.
- [x] Ship Log: this file.
Related
- Handoff spec:
docs/push-first-coordination-phase-1.md - Full architectural plan:
/Users/justin/.claude/plans/hazy-gathering-gadget.md - Coordinated with brainstorm CLI peer (6sv3av7w); CLI side ships subscriber + dashboard panel +
routing.routingStreamconfig flag in parallel. - Phase 2 (CLI-only): wire stream events into local Wilson/Thompson in-memory state.
- Phase 3 (CLI-only): peer broker for cross-session coordination.