2026-05-11-envelope-stress-tests

Trust Envelope stress-test suite + 4 findings

Date: 2026-05-11 Status: shipped Slug: envelope-stress-tests Branch: feature/envelope-stress-tests

Summary

Adds src/security/trust-envelope/__tests__/envelope-stress.test.ts — 34 tests targeting the gaps that existing happy-path coverage didn't catch: adversarial synthesis inputs, cryptographic round-trip under tampering, routing-gate behavior under bizarre envelope shapes, and schema-level drift guards.

All 34 tests pass against current code. Four real findings surfaced during development. Each is asserted against current behavior (not the documented behavior) with a FINDING: comment so the test becomes a self-documenting regression anchor. When the underlying bug is fixed, the assertion flips and the fix is automatically visible in the diff.

Findings (severity-ranked)

🔴 HIGH — Signal precedence in code contradicts its own docstring

src/security/trust-envelope/routing-gates.ts lines 13-22 say:

1. tier  2. xdr_risk  3. anomaly_score

Lines 114-130 implement:

1. xdr_risk >= 0.7  2. anomaly >= 0.8  3. tier ∈ {restricted, bronze}

A test with tier="restricted" + xdr_risk=0.95 returns source: "xdr_risk", not "tier". The docstring promised tier-first; the code is xdr_risk-first.

Downstream impact: my external docs (docs/concepts/trust-envelope.mdx, docs/concepts/agent-reputation.mdx) describe the docstring's order, not the code's order. Anyone reading them gets the wrong answer about "which signal caused this routing decision."

The code's _rationale comment_ on line 17 ("XDR is sourced outside BR so it has the highest reliability") supports xdr-first ordering, so the follow-up fix should align the docstring + external docs to match the code, not the other way around.

🟡 MEDIUM — spent_usd not clamped in synth

synth.ts:135const spentUsd = (await deps.getBudgetSpent?.(...)) ?? 0 — no Math.max(0, ...) clamp.

A misbehaving / malicious getBudgetSpent dep can produce a negative br_budget.spent_usd. The schema's nonnegative() would reject it on the verify side — so the envelope is signed but cannot round-trip through verifyEnvelope. Defense-in-depth gap.

Fix: const spentUsd = Math.max(0, (await deps.getBudgetSpent?.(...)) ?? 0);

🟡 MEDIUM — Empty br_scope.models=[] treated as wildcard

routing-gates.ts:96if (allowedModels !== "" && allowedModels.length > 0). When length === 0, filtering is skipped. So models: [] and models: "" produce identical candidate sets.

Impact: semantically ambiguous. A key whose allowedModels is persisted as [] (e.g. from an admin endpoint that accepts empty array as "clear") behaves as if scope is unrestricted, not as "deny all".

Fix path: either (a) reject [] at schema level (force null for "clear" and non-empty array for "allowlist"), or (b) change line 96 to if (allowedModels !== "*") so [] becomes "deny all". Requires a design call.

🟢 LOW — Source label "anomaly" vs. docs "anomaly_score"

routing-gates.ts:123 emits source = "anomaly". My external docs use "anomaly_score". Pure docs drift; trivially fixable in the same PR that fixes the precedence docstring.

Test categories (all 34 pass)

SectionTestsWhat's covered
Synthesis adversarial inputs124096-char fields, NaN/Infinity/negative budgets, 1000+ allowlists, missing apiKey, dep injection, schema rejections
Sign + verify round-trip4Claim preservation, tamper rejection, unknown-key rejection, graceful key rotation
Routing-gate bizarre inputs10Threshold boundaries (0.7 / 0.8 inclusive), warn/off mode semantics, null envelope, empty/wildcard scope, signal precedence
Schema drift guards8Negative budgets, out-of-range xdr_risk, empty sub/org_id, unknown tier enum, unknown auth_method, malformed parent_chain

What this suite explicitly does NOT cover

  • Live production deployment evidence. This is all in-process. A

separate live probe (hitting api.brainstormrouter.com with a real key) would confirm the synth middleware is actually wired on the prod request path and that X-BR-Envelope: audit responses arrive. Worth doing if there's a suspicion the synth isn't deployed.

  • Per-route envelope coverage. Doesn't check whether every API

route runs through the envelope middleware. The seven invisible middleware contracts (CLAUDE.md memory) mean some routes silently bypass — adding an explicit "every authenticated route has envelope synth" test belongs to a future PR.

  • Cross-system drift. Doesn't compare envelope claims against the

actual DB state (agent_reputation table, api_keys table). That's the Issue #284 follow-up — expose envelope telemetry on response headers so an external observer can verify drift.

Verification

  • pnpm tsgo — exit 0
  • pnpm exec oxfmt --check / oxlint --type-aware — clean
  • pnpm test:fast — 7813/0 (+34 from this PR)
  • All four findings reproduce reliably; assertions track current

behavior so the suite stays green until the fixes land.

Recommended follow-ups (in priority order)

  1. HIGH: fix the precedence docstring + the two external concept docs

to match code (xdr_risk → anomaly → tier). ~10 LOC, no behavior change.

  1. MEDIUM: clamp spent_usd in synth + add a synth-side validation

pass that asserts schema-conformance before returning. ~5 LOC + 1 test that flips the FINDING: ...spent_usd... assertion.

  1. MEDIUM: disambiguate empty models: [] semantics. Requires

product decision before code.

Lockstep

  • TS / Python SDK / MCP — no public API surface change
  • OpenAPI — no new routes
  • Ship log — this file