Budget Controls & Kill Switch — Unified Governance MVP

2026-03-10

budget-managementkill-switchgovernance

LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: [ "GET /v1/budget/status", "PUT /v1/budget/limits", "GET /v1/budget/forecast", "POST /v1/killswitch/activate", "POST /v1/killswitch/deactivate", "GET /v1/killswitch/status", "GET /auth/budget/status (dashboard bridge)", "GET /auth/killswitch/status (dashboard bridge)", ] sdk_methods_updated: [ "client.budget.status()", "client.budget.updateLimits()", "client.budget.forecast()", "client.killswitch.activate()", "client.killswitch.deactivate()", "client.killswitch.status()", ] mcp_tools_updated: [ "br_get_budget_status", "br_get_budget_forecast", "br_activate_killswitch", "br_deactivate_killswitch", "br_get_killswitch_status", ] ---

What We Built

Consolidated BrainstormRouter's fragmented budget tracking and emergency stop mechanisms into a coherent MVP governance layer. Operators now have a single, consistent interface for spend monitoring, budget limits, spend forecasting, and emergency kill switch controls across API, both SDKs, MCP tools, and dashboard.

The budget system reads tenant-aggregated spend from Redis (same counters the routing middleware writes) and compares against ConfigStore-backed daily/monthly limits. The kill switch uses the proven ConfigStore persistence pattern with 500ms in-memory caching. When activated, enforcement takes effect within 500ms on all processes that have read the state. On config-store read failure, a process that previously cached an active kill switch continues blocking (fail-closed from known-active state). A cold process start during a simultaneous config-store outage allows traffic until it successfully reads from Postgres — true unconditional fail-closed requires a distributed consensus layer deferred to a future phase.

Why It Matters

Before this consolidation, BrainstormRouter had three separate budget systems not sharing state and a kill switch buried behind Supabase JWT authentication (inaccessible to API key users or MCP agents). An operator couldn't answer "how much has my tenant spent today?" or "stop all routing immediately" through a unified interface. Now they can — from any surface.

How It Works

// Check tenant-level spend
const status = await client.budget.status();
// { daily: { spent_usd: 12.50, limit_usd: 50, remaining_usd: 37.50 }, ... }

// Set budget limits
await client.budget.updateLimits({ daily_limit_usd: 50, monthly_limit_usd: 500 });

// Emergency stop
await client.killswitch.activate("security incident #123");
// All /v1/chat/completions requests now return 503 with X-BR-Kill-Switch: active

// Resume
await client.killswitch.deactivate();

When the kill switch is active, all non-exempt tenant-scoped endpoints return 503. Exempt prefixes include /v1/budget/, /v1/killswitch/, /v1/security/, /v1/governance/, /v1/usage/, /v1/insights/, /v1/ops/, /v1/mcp/connect, /v1/models, /v1/api-keys, /v1/config, and /health — so operators can diagnose and recover during incidents. MCP governance tools (budget status, forecast, kill switch activate/deactivate/status) remain callable at the tool level while all other MCP tools are blocked.

Note: Policy enforcement and approval/denial visibility already exist via the policy engine and security API. This phase ensures budget/kill switch surfaces respect existing RBAC permissions. Further policy consolidation is deferred.

The Numbers

  • 8 API endpoints (6 /v1/ + 2 /auth/ dashboard bridge), 5 MCP governance tools
  • 63 tests across 11 test files: 41 from initial implementation (budget.test.ts:8, killswitch.test.ts:9, budget.route.test.ts:5, killswitch.route.test.ts:6, server.test.ts:9, auth.test.ts:4) + 22 from consolidation phase (SDK-TS:6, SDK-PY:12, dashboard:4). All phase tests pass; 1 pre-existing auth.test.ts failure (api-keys POST) unrelated to this phase.
  • Kill switch enforcement: <1ms overhead (middleware path check)
  • Budget status: single Redis read per period (O(1))
  • ConfigStore persistence: 500ms kill-switch cache / 5s settings cache, Postgres-backed. Fail-closed from known-active state; cold-start during config-store outage defers to first successful read.

Live Verification Required

The following behaviors require production/staging verification beyond unit tests:

  • [ ] Redis spend counters: recordSpend() increments both daily and monthly tenant aggregates in Redis. Verify with a real API call that spend is recorded and GET /v1/budget/status reflects it.
  • [ ] ConfigStore kill switch persistence: Activate kill switch via POST /v1/killswitch/activate, verify GET /v1/killswitch/status returns active. Restart the process and verify the state persists from Postgres.
  • [ ] Dashboard JWT bridge: Access GET /auth/budget/status and GET /auth/killswitch/status from the dashboard with a valid Supabase JWT. Verify tenant-scoped data is returned.
  • [ ] Kill switch enforcement: With kill switch active, verify POST /v1/chat/completions returns 503 while GET /v1/budget/status returns 200.
  • [ ] MCP tool-level enforcement: Connect via MCP, activate kill switch, verify br_list_models returns error while br_get_budget_status succeeds.

Competitive Edge

No other AI gateway offers integrated budget forecasting with kill switch controls accessible from SDK, API, and MCP simultaneously. Portkey has basic cost tracking but no emergency stop. OpenRouter has neither. BrainstormRouter is the only platform where an AI agent can check its own budget status and an operator can halt all routing instantly — through the same governance layer.

Lockstep Checklist

> _You MUST check these boxes [x] and verify the corresponding files are updated BEFORE committing this log._

  • [x] API Routes: src/api/routes/budget.ts, src/api/routes/killswitch.ts, src/api/middleware/killswitch.ts created.
  • [x] TS SDK: packages/sdk-ts/src/resources/budget.ts, killswitch.ts created, index.ts updated.
  • [x] Python SDK: packages/sdk-py/src/brainstormrouter/resources/budget.py, killswitch.py created, __init__.py updated. Also fixed governance.py route paths (/v1/governance/audit/v1/governance/memory/audit, etc.) to match existing API routes that were returning 404 with the old paths.
  • [x] MCP Schemas: 5 new tools in tool-manifest.ts, registered in server.ts, wired with runtime implementations.
  • [x] Docs: API reference pages (budget.mdx, killswitch.mdx), SDK reference (budget-and-killswitch.mdx), OpenAPI specs updated with all endpoints including /auth/* dashboard bridge routes.
  • [x] Dashboard: BudgetStatusResponse type consolidated to match full API response shape (added key_id, key_budget). 4 dashboard visibility tests added.
  • [x] agents.json: Governance tools annotated with governance: true discoverability metadata (runtime enforcement is in MCP_GOVERNANCE_TOOLS in src/api/middleware/killswitch.ts).
  • [x] Master Record: Governance surfaces consolidated under budget-management and kill-switch capabilities.