Real-Time Telemetry SSE — live model health, budget alerts, routing decisions

2026-03-24

routercircuit-breakerbudget-forecasterobservability

What We Built

GET /v1/telemetry/live — an SSE stream of live operational data from BrainstormRouter's intelligence systems. The stream emits three event types: model-health (circuit breaker status for each model endpoint, every 10 seconds), budget-alert (threshold-crossing alerts at 80%, 50%, 20%, 5% remaining), and routing-decision (real-time forwarding of routing events via the broadcast event tap).

The endpoint is tenant-scoped — each client only sees data for their own API key's tenant. Connections include automatic keepalive comments every 30 seconds and a 1-hour max duration with graceful close notification.

Why It Matters

Brainstorm CLI uses this for pre-flight routing checks: if GPT-4.1 is degraded, route to Sonnet before failing. Budget warnings appear in real-time, not after the request. This transforms BrainstormRouter from a request-response gateway into a live operational awareness platform.

How It Works

The SSE stream aggregates three data sources:

  1. Circuit Breaker (CircuitBreakerManager.getAllStates()) — polled every 10s, maps circuit states to human-readable status: closedavailable, half-opendegraded, openunavailable
  2. Budget Forecaster (BudgetForecaster.forecast()) — checked alongside health, emits alerts only on threshold crossings to avoid noise
  3. Broadcast Event Tap (registerEventTap()) — real-time forwarding of usage events filtered to the authenticated tenant
// TypeScript SDK
for await (const event of client.telemetry.live()) {
  if (event.type === "model-health") {
    console.log(event.data.model, event.data.status);
  }
}
# Python SDK
for event in client.telemetry.live():
    if event.type == "budget-alert":
        print(event.data["alert"], event.data["remaining_usd"])

The Numbers

  • 3 event types: model-health, budget-alert, routing-decision
  • 10s health polling interval
  • 30s keepalive interval
  • 1 hour max connection (auto-reconnect expected)
  • 4 budget thresholds (80%, 50%, 20%, 5%)
  • 8 tests covering SSE setup, event emission, circuit state mapping, cleanup

Competitive Edge

No other AI gateway provides real-time SSE telemetry of circuit breaker state, budget alerts, and routing decisions in a single stream. Portkey and OpenRouter offer dashboards but no machine-consumable live feed. This enables CLI tools like Brainstorm to make intelligent pre-flight decisions.

Lockstep Checklist

  • [x] API Routes: src/api/routes/telemetry.ts created, mounted in server.ts.
  • [x] TS SDK: packages/sdk-ts/src/resources/telemetry.tsclient.telemetry.live() async generator.
  • [x] Python SDK: packages/sdk-py/src/brainstormrouter/resources/telemetry.py — sync + async client.telemetry.live().
  • [x] MCP Schemas: N/A — SSE streams don't map to MCP tools.
  • [ ] Master Record: Update pending.