2026-05-08-sandbox-savings-pollution
2026-05-08 — Filter sandbox responses out of savings ledger
Summary
R20 risk register, flagged by 1/10 agents (A6 Attacker): 9,080 / 16,275 = 56% of savings traffic to brainstorm/brainstorm/sandbox.
Investigated. Two distinct bugs in one symptom:
Bug 1 — Sandbox responses in savings ledger
src/api/routes/completions/index.ts returns a synthetic 200 OK to tenants without provider keys (sandbox mode quickstart UX). It writes a usage_events row with routing_strategy='sandbox', provider='brainstorm', model='brainstorm/sandbox'.
src/router/intelligence/savings-tracker.ts:summarizeFromUsage aggregates usage_events for the savings ledger. It already filtered is_cached=false, but did NOT filter sandbox rows. So every "no provider keys, return quickstart message" call accumulated into trackedRequests, with actualModel = ${row.provider}/${row.model} = "brainstorm/brainstorm/sandbox" (the double-brainstorm is the SQL provider/model concatenation, NOT the cache-prefix nesting bug fixed in PR #224).
Impact: R20 evidence reported trackedRequests: 16275 and avgSavingsPerRequest: 0.0095. ~56% of the denominator was sandbox traffic with $0 savings, dragging the average down by 56% from the actual routing-decision average.
Fix: add routing_strategy != 'sandbox' AND provider != 'brainstorm' to the SQL WHERE clause in summarizeFromUsage().
Bug 2 — Stale "247 production models" string in sandbox response
The sandbox response message at src/api/routes/completions/index.ts:1145 told new users:
> "To unlock 247 production models (Claude, GPT-4o, Gemini, and more)..."
Same drift class as PR #222 (which corrected /llms.txt and the runbook). PR #222's drift gate scans src/api/server.ts but didn't scan src/api/routes/completions/index.ts. Replaced the stale count with a count-free message that points to GET /v1/models for the live count.
Changes
src/router/intelligence/savings-tracker.ts— SQL addsrouting_strategy != 'sandbox'ANDprovider != 'brainstorm'filters with comment explaining the regression.src/router/intelligence/savings-tracker.test.ts— new test:summarizeFromUsage emits SQL with explicit sandbox + brainstorm-provider exclusions. Mocks the pool, captures the SQL, asserts both filters present.src/api/routes/completions/index.ts— sandbox response message no longer pins "247"; mentions live/v1/modelsendpoint.
Verification
Test-first discipline:
# Without fix (source reverted, test in place):
FAIL src/router/intelligence/savings-tracker.test.ts
AssertionError: expected '\n SELECT\n COALESCE(rout…' to match /!=\s*'sandbox'/
# With fix:
Test Files 819 passed | 8 skipped (827)
Tests 7489 passed | 191 skipped (7680)
pnpm check clean.
Production effect
After deploy, /v1/intelligence/savings will report:
trackedRequestsdrops from ~16,275 to ~7,200 (sandbox responses no longer counted)byPathno longer containsbrainstorm/brainstorm/sandboxrowsavgSavingsPerRequestrises from $0.0095 to ~$0.022 (more accurate — sandbox $0 entries no longer dilute)totalSavingsUsdunchanged at ~$155 (the dollar figure was always correct; only the denominator was inflated)
Historical sandbox-polluted rows in the underlying usage_events table are unchanged. The fix applies at query time, so the next call to /v1/intelligence/savings will reflect the corrected aggregation.
Lockstep checklist
- [x] Source — savings-tracker.ts SQL filter; sandbox message string
- [x] Test — paired regression test (verified fails without fix)
- [x] Ship log — this file
- [x] R20 risk register — 1/10 closed
- [ ] Drift gate extension —
scripts/check-doc-state-drift.mjsshould also scansrc/api/routes/completions/index.tsfor stale model-count strings. Deferred to PR #222 follow-up since the gate doesn't exist on main yet.