P0a — Agent budget enforcement (canonicalize agentLimits keys; closes Computer Protocol #244)
2026-05-09
LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: ["PATCH /v1/agent-limits", "PUT /v1/agent-limits", "POST /v1/chat/completions (response header)"] sdk_methods_updated: [ "none — no SDK contract change (canonicalization is server-side; SDKs that call PATCH continue working with bare or prefixed keys)", ] mcp_tools_updated: ["none"] ---
What We Built
Closes Computer Protocol issue #244. The bug: an agent with enforced_daily_usd: 0.000001 successfully ran a $0.000805 auto:best completion. Header reported X-BR-Budget-Remaining: 49.95 (tenant-level), not agent-level. Adversarial-agent role rated this 8.0 attacker-advantage in the Computer Protocol stochastic review.
Root cause (verified end-to-end in code):
src/api/middleware/agent-limits.ts:34callsresolveAgentOrUserId(c, body).src/api/shared/resolve-user-id.ts:53-62returnsagent:${agentIdentity.agentId}(prefixed) for agent-JWT callers — the prefix is the canonical namespace separator from human user IDs in the Redis budget/rate-limit stores.src/api/routes/agent-limits.tsPUT/PATCH stored whatever string the operator submitted, with no canonicalization. Operators looking at/v1/agents/profiles/mesee bareagentIdstrings everywhere and naturally submit those bare strings as map keys.- Result: middleware looked up
agent:inagentLimits, found nothing under bare, returnednext()— no enforcement.
Fix:
- New helper
src/api/shared/agent-limits-keys.tsexports:
canonicalizeAgentLimitsKey(rawKey)— bare →agent:; preserves already-canonical anduser:prefixed keys.canonicalizeAgentLimitsMap(input)— applies canonicalization across a map.lookupAgentLimit(map, canonicalKey)— migration-aware lookup. Tries canonical first; falls back to bare key for legacy storage. Returns{value, matchedKey, legacy}so callers can log migration progress.
PUT /v1/agent-limits— canonicalizes the entire incoming map before persisting.
PATCH /v1/agent-limits— canonicalizes each entry. Only touches the canonical key: writes go toagent:, null-deletes removeagent:only. Legacy bare-key entries are left in place to avoid the cross-tenant blast-radius bug codex round-1 flagged (a tenant with both an agent atagent:abcand a human at bareabcwould lose the human entry on any PATCH that auto-cleaned bare aliases). Migration converges via the read-side shim (lookupAgentLimitprefers canonical over bare); legacy entries can be cleaned up explicitly with a PATCH submitting the bare key withnull.
agentLimitsMiddleware— useslookupAgentLimitto read the canonical key first, falls back to bare for backward compat. Logs[Agent limit matched on legacy bare key …]warning once per legacy match so operator-side migration progress is observable in CloudWatch. Sets_agentBudgetRemainingUsdcontext variable.
non-streaming.tscompletion route emitsX-BR-Agent-Budget-Remainingresponse header (separate fromX-BR-Budget-Remainingwhich stays tenant-level). 6-decimal precision so micro-budgets like the$0.000001test case display correctly.
- Error response now includes a
recoveryfield with hint + endpoint pointing operators atPATCH /v1/agent-limitsfor resolution.
Why It Matters
This was a credibility-class gap. The Computer Protocol release-gate items 1 ("Agent budget overrun is blocked before provider execution") and 2 ("Agent identity appears correctly in completion audit, headers, usage, and cost-center attribution") both depended on this enforcement working. Until this PR, both items were red.
The fix also eliminates a class of operator-confusion bug: anyone interacting with /v1/agent-limits via the documented endpoints would silently fail enforcement unless they happened to know the runtime resolver's agent: prefix convention — which was undocumented and visible only by reading resolve-user-id.ts. Server-side canonicalization removes the ambiguity.
How It Works
Operator: PATCH /v1/agent-limits
body: {"abc-123": {"maxBudgetUsd": 0.000001}}
↓ canonicalizeAgentLimitsKey("abc-123") → "agent:abc-123"
Stored: tenantSettings.agentLimits["agent:abc-123"] = {maxBudgetUsd: 0.000001}
Runtime: POST /v1/chat/completions (with agent JWT)
→ resolveAgentOrUserId returns "agent:abc-123"
→ lookupAgentLimit(settings.agentLimits, "agent:abc-123")
→ canonical match → returns {maxBudgetUsd: 0.000001, ..., legacy: false}
→ middleware enforces budget → 402 if spend ≥ cap
→ c.set("_agentBudgetRemainingUsd", remaining)
Response handler:
→ c.header("X-BR-Agent-Budget-Remaining", remaining.toFixed(6))
Migration shim (read-side only):
If canonical key not found, lookupAgentLimit strips "agent:" and tries bare:
agentLimits["abc-123"] → matched as legacy → log warning + return value
Operator can re-PATCH at any time to converge on canonical storage.
Test Plan
- 14 new unit tests in
src/api/shared/agent-limits-keys.test.tscovering: - Canonicalization: bare → prefixed; already-prefixed left alone; user-prefixed left alone; UUID format
- Map canonicalization: full normalization; user-prefix preservation; duplicate-collision last-write-wins
- Lookup: canonical match (legacy=false); bare-key fallback (legacy=true); canonical-preferred-when-both-exist; missing → no match; undefined map; user-prefix isolation (no fallback strip)
- Existing
src/api/middleware/agent-limits.test.tscontinues to pass (migration shim is backward-compatible) - Full suite: 7676 passed / 0 failed (840 → 840 test files, +14 tests, +1 file)
pnpm tsgoclean;oxlint --type-aware0 warnings 0 errors;oxfmtclean
What This Doesn't Fix
- Mass migration of existing legacy storage. This PR's migration shim is read-side only — legacy bare-key entries stay in storage until the next
PUT/PATCHrewrites them. A separate one-shot migration could canonicalize every tenant'sagentLimitsmap in one pass; deferred because (a) the warning log surfaces which tenants still need a re-PATCH, (b) the production deploy doesn't have hard SLA pressure to drain legacy keys (graceful migration is fine), (c) blast-radius of a bulk rewrite is higher than this PR warrants. - Computer Protocol issue #245 (mTLS-behind-ALB acceptance) — separate PR, P0b on queue.
- Computer Protocol issue #246 (tool-call routing reasoning-model exclusion + cost) — P1 on queue.
Lockstep
- TypeScript SDK: no change. SDK callers of
PATCH /v1/agent-limitscontinue to work — bare or prefixed keys both succeed; server canonicalizes. - Python SDK: no change.
- MCP tools: no change.
- API surface:
PATCH/PUT /v1/agent-limitsrequest shape unchanged (same Zod schemas); response shape unchanged. New response headerX-BR-Agent-Budget-Remainingon completions. site/public/routes.json: unchanged.- OpenAPI: no shape changes; consider documenting the new header in a follow-up doc PR.
- Ship log: this entry.
Closes Release Gate
- Item 1: Agent budget overrun is blocked before provider execution. ✓
- Item 2: Agent identity appears correctly in completion audit, headers, usage, and cost-center attribution. ✓ (the new
X-BR-Agent-Budget-Remainingheader is the agent-identity surface; cost-center attribution is partial — depends on issue #248 TS-2 to fully close.)