2026-03-26-context-window-utilization
Context Window Utilization Headers
Date: 2026-03-26 Phase: 084-context-window-utilization Systems: guardian middleware, response headers
What
Added X-BR-Context-Used, X-BR-Context-Capacity, X-BR-Context-Utilization, and X-BR-Context-Warning response headers to /v1/chat/completions. These signal to agents how much of the model's context window was consumed.
Why
Agents (Brainstorm CLI, GTM agents, OpenClaw) need to know when they're approaching context window limits so they can proactively compact conversation history before hitting truncation or rejection.
Headers
| Header | Value | Example |
|---|---|---|
X-BR-Context-Used | Actual input tokens from provider response | 52340 |
X-BR-Context-Capacity | Model's max context window (from registry) | 200000 |
X-BR-Context-Utilization | Ratio (used / capacity), 2 decimal places | 0.26 |
X-BR-Context-Warning | high (>85%) or critical (>95%), absent otherwise | high |
Implementation
- Modified
setGuardianPostHeaders()inguardian.tsto add context headers after existing cost/efficiency headers - Added context utilization to
buildGuardianStreamPayload()for SSE streaming responses - Fail-open: headers omitted when endpoint has no
maxContextTokens(0) or when guardian is disabled - CORS: already handled —
Access-Control-Expose-Headers: *is set globally
Lockstep Checklist
- [x] API:
src/api/middleware/guardian.tsmodified - [x] Tests:
src/api/middleware/context-utilization.test.ts— 8 tests - [x] Docs: this ship log
- [x] SDK-TS: N/A (headers are automatic HTTP metadata)
- [x] SDK-PY: N/A
- [x] MCP: N/A (MCP responses don't carry HTTP headers)
- [x] CLI: N/A
- [x] GTM: N/A
- [x] Dashboard: N/A