2026-05-11-envelope-stress-tests
Trust Envelope stress-test suite + 4 findings
Date: 2026-05-11 Status: shipped Slug: envelope-stress-tests Branch: feature/envelope-stress-tests
Summary
Adds src/security/trust-envelope/__tests__/envelope-stress.test.ts — 34 tests targeting the gaps that existing happy-path coverage didn't catch: adversarial synthesis inputs, cryptographic round-trip under tampering, routing-gate behavior under bizarre envelope shapes, and schema-level drift guards.
All 34 tests pass against current code. Four real findings surfaced during development. Each is asserted against current behavior (not the documented behavior) with a FINDING: comment so the test becomes a self-documenting regression anchor. When the underlying bug is fixed, the assertion flips and the fix is automatically visible in the diff.
Findings (severity-ranked)
🔴 HIGH — Signal precedence in code contradicts its own docstring
src/security/trust-envelope/routing-gates.ts lines 13-22 say:
1. tier 2. xdr_risk 3. anomaly_score
Lines 114-130 implement:
1. xdr_risk >= 0.7 2. anomaly >= 0.8 3. tier ∈ {restricted, bronze}
A test with tier="restricted" + xdr_risk=0.95 returns source: "xdr_risk", not "tier". The docstring promised tier-first; the code is xdr_risk-first.
Downstream impact: my external docs (docs/concepts/trust-envelope.mdx, docs/concepts/agent-reputation.mdx) describe the docstring's order, not the code's order. Anyone reading them gets the wrong answer about "which signal caused this routing decision."
The code's _rationale comment_ on line 17 ("XDR is sourced outside BR so it has the highest reliability") supports xdr-first ordering, so the follow-up fix should align the docstring + external docs to match the code, not the other way around.
🟡 MEDIUM — spent_usd not clamped in synth
synth.ts:135 — const spentUsd = (await deps.getBudgetSpent?.(...)) ?? 0 — no Math.max(0, ...) clamp.
A misbehaving / malicious getBudgetSpent dep can produce a negative br_budget.spent_usd. The schema's nonnegative() would reject it on the verify side — so the envelope is signed but cannot round-trip through verifyEnvelope. Defense-in-depth gap.
Fix: const spentUsd = Math.max(0, (await deps.getBudgetSpent?.(...)) ?? 0);
🟡 MEDIUM — Empty br_scope.models=[] treated as wildcard
routing-gates.ts:96 — if (allowedModels !== "" && allowedModels.length > 0). When length === 0, filtering is skipped. So models: [] and models: "" produce identical candidate sets.
Impact: semantically ambiguous. A key whose allowedModels is persisted as [] (e.g. from an admin endpoint that accepts empty array as "clear") behaves as if scope is unrestricted, not as "deny all".
Fix path: either (a) reject [] at schema level (force null for "clear" and non-empty array for "allowlist"), or (b) change line 96 to if (allowedModels !== "*") so [] becomes "deny all". Requires a design call.
🟢 LOW — Source label "anomaly" vs. docs "anomaly_score"
routing-gates.ts:123 emits source = "anomaly". My external docs use "anomaly_score". Pure docs drift; trivially fixable in the same PR that fixes the precedence docstring.
Test categories (all 34 pass)
| Section | Tests | What's covered |
|---|---|---|
| Synthesis adversarial inputs | 12 | 4096-char fields, NaN/Infinity/negative budgets, 1000+ allowlists, missing apiKey, dep injection, schema rejections |
| Sign + verify round-trip | 4 | Claim preservation, tamper rejection, unknown-key rejection, graceful key rotation |
| Routing-gate bizarre inputs | 10 | Threshold boundaries (0.7 / 0.8 inclusive), warn/off mode semantics, null envelope, empty/wildcard scope, signal precedence |
| Schema drift guards | 8 | Negative budgets, out-of-range xdr_risk, empty sub/org_id, unknown tier enum, unknown auth_method, malformed parent_chain |
What this suite explicitly does NOT cover
- Live production deployment evidence. This is all in-process. A
separate live probe (hitting api.brainstormrouter.com with a real key) would confirm the synth middleware is actually wired on the prod request path and that X-BR-Envelope: audit responses arrive. Worth doing if there's a suspicion the synth isn't deployed.
- Per-route envelope coverage. Doesn't check whether every API
route runs through the envelope middleware. The seven invisible middleware contracts (CLAUDE.md memory) mean some routes silently bypass — adding an explicit "every authenticated route has envelope synth" test belongs to a future PR.
- Cross-system drift. Doesn't compare envelope claims against the
actual DB state (agent_reputation table, api_keys table). That's the Issue #284 follow-up — expose envelope telemetry on response headers so an external observer can verify drift.
Verification
pnpm tsgo— exit 0pnpm exec oxfmt --check/oxlint --type-aware— cleanpnpm test:fast— 7813/0 (+34 from this PR)- All four findings reproduce reliably; assertions track current
behavior so the suite stays green until the fixes land.
Recommended follow-ups (in priority order)
- HIGH: fix the precedence docstring + the two external concept docs
to match code (xdr_risk → anomaly → tier). ~10 LOC, no behavior change.
- MEDIUM: clamp
spent_usdin synth + add a synth-side validation
pass that asserts schema-conformance before returning. ~5 LOC + 1 test that flips the FINDING: ...spent_usd... assertion.
- MEDIUM: disambiguate empty
models: []semantics. Requires
product decision before code.
Lockstep
- TS / Python SDK / MCP — no public API surface change
- OpenAPI — no new routes
- Ship log — this file