2026-03-26-context-window-utilization

Context Window Utilization Headers

Date: 2026-03-26 Phase: 084-context-window-utilization Systems: guardian middleware, response headers

What

Added X-BR-Context-Used, X-BR-Context-Capacity, X-BR-Context-Utilization, and X-BR-Context-Warning response headers to /v1/chat/completions. These signal to agents how much of the model's context window was consumed.

Why

Agents (Brainstorm CLI, GTM agents, OpenClaw) need to know when they're approaching context window limits so they can proactively compact conversation history before hitting truncation or rejection.

Headers

Header	Value	Example
`X-BR-Context-Used`	Actual input tokens from provider response	`52340`
`X-BR-Context-Capacity`	Model's max context window (from registry)	`200000`
`X-BR-Context-Utilization`	Ratio (used / capacity), 2 decimal places	`0.26`
`X-BR-Context-Warning`	`high` (>85%) or `critical` (>95%), absent otherwise	`high`

Implementation

Modified setGuardianPostHeaders() in guardian.ts to add context headers after existing cost/efficiency headers
Added context utilization to buildGuardianStreamPayload() for SSE streaming responses
Fail-open: headers omitted when endpoint has no maxContextTokens (0) or when guardian is disabled
CORS: already handled — Access-Control-Expose-Headers: * is set globally

Lockstep Checklist

[x] API: src/api/middleware/guardian.ts modified
[x] Tests: src/api/middleware/context-utilization.test.ts — 8 tests
[x] Docs: this ship log
[x] SDK-TS: N/A (headers are automatic HTTP metadata)
[x] SDK-PY: N/A
[x] MCP: N/A (MCP responses don't carry HTTP headers)
[x] CLI: N/A
[x] GTM: N/A
[x] Dashboard: N/A