2026-03-26-context-window-utilization

Context Window Utilization Headers

Date: 2026-03-26 Phase: 084-context-window-utilization Systems: guardian middleware, response headers

What

Added X-BR-Context-Used, X-BR-Context-Capacity, X-BR-Context-Utilization, and X-BR-Context-Warning response headers to /v1/chat/completions. These signal to agents how much of the model's context window was consumed.

Why

Agents (Brainstorm CLI, GTM agents, OpenClaw) need to know when they're approaching context window limits so they can proactively compact conversation history before hitting truncation or rejection.

Headers

HeaderValueExample
X-BR-Context-UsedActual input tokens from provider response52340
X-BR-Context-CapacityModel's max context window (from registry)200000
X-BR-Context-UtilizationRatio (used / capacity), 2 decimal places0.26
X-BR-Context-Warninghigh (>85%) or critical (>95%), absent otherwisehigh

Implementation

  • Modified setGuardianPostHeaders() in guardian.ts to add context headers after existing cost/efficiency headers
  • Added context utilization to buildGuardianStreamPayload() for SSE streaming responses
  • Fail-open: headers omitted when endpoint has no maxContextTokens (0) or when guardian is disabled
  • CORS: already handled — Access-Control-Expose-Headers: * is set globally

Lockstep Checklist

  • [x] API: src/api/middleware/guardian.ts modified
  • [x] Tests: src/api/middleware/context-utilization.test.ts — 8 tests
  • [x] Docs: this ship log
  • [x] SDK-TS: N/A (headers are automatic HTTP metadata)
  • [x] SDK-PY: N/A
  • [x] MCP: N/A (MCP responses don't carry HTTP headers)
  • [x] CLI: N/A
  • [x] GTM: N/A
  • [x] Dashboard: N/A