Trust envelope + ErrorEnvelope + activation gate — all 6 goal conditions live

2026-05-14

trust-envelopeerror-envelopeops-statusfortress

What We Built

Closed the gap between "shipped" and "running" for three subsystems that had been documented and scored as live but were actually mode = off in production: trust envelope synth, trust envelope routing-gate, and trust envelope guardrail escalation. After this session the production gateway returns:

{ "synth": "audit-only", "routing": "enforce", "guardrails": "enforce" }

A required branch-protection check (fortress-subsystem-activation.e2e.test.ts) now asserts these values are not "off" and makes any future regression fail CI on main. Per-tenant max_rpm is set on the active tenant. The new ErrorEnvelope typed contract covers every 4xx/5xx response with a discriminated-union recovery action — agents see retry_at_iso, quota_reset_at_iso, forbidden, escalate, etc. as typed fields rather than free-form strings.

Why It Matters

R26 stochastic review surfaced that capability claims ("trust envelope is shipped") were being scored without verifying the runtime state. Code under src/security/trust-envelope/ existed and was wired, but the routing middleware was no-op because gateway.envelope.routing.mode = "off" in the production config. The activation plan turned that into a measurable target with six concrete conditions, then this session landed all six.

The most important discipline embedded in the plan: the fortress assertion was written before the activation flag flipped. The assertion gates the activation. Future config drift back to "off" now breaks the build rather than silently invalidating the assessment.

How It Works

Each subsystem reports its runtime mode in GET /v1/ops/status under a new subsystems block. Boot reads gateway.envelope.{synth,routing,guardrails} from the configs table (system-scoped, default tenant). The fortress E2E test hits production via FORTRESS_API_URL and asserts each mode against its goal value.

The R26 mystery — config writes returning version: 1 but boot reading undefined — turned out to be an assembler bug. src/config/config-decompose.ts hand-listed which gateway sub-keys to compose into BRConfig.gateway (auth/http/router/tls/controlUi) and silently dropped gateway.envelope. The fix iterates the same GATEWAY_NAMED_KEYS list in both decomposeConfig and assembleConfig, so a missing entry is now a TypeScript error rather than a silent runtime drop.

The Numbers

  • 6/6 goal conditions live in production (verified /v1/ops/status at 22:52Z, task-def :828)
  • 7 PRs landed in this session: #329 #330 #331 #332 #333 #334 + earlier #320
  • Fortress E2E suite: 8/8 ACT-\* tests pass against live production
  • Smoke test post-activation: /v1/chat/completions auto-routed to deepseek-v4-flash in 2.34s under routing=enforce

Competitive Edge

The "active vs documented" gap is the dirty secret of every governance/control-plane vendor: shipping capability-as-feature-flag, scoring it as live, and never actually flipping the flag on production data. Tying the assertion to a required branch-protection check forces the gap closed at the CI gate, not the assessment gate.

Lockstep Checklist

  • [x] API Routes: PUT /auth/admin/tenants/:id/rate-limit and PUT /v1/config/gateway.envelope added (PRs #331, #332).
  • [x] TS SDK: authAdminTenantsSet_rate_limit generated in packages/sdk-ts/src/generated/resources/admin.ts.
  • [x] Python SDK: auto-generated alongside TS via pnpm gen:contract.
  • [x] MCP Schemas: N/A — these are platform-admin endpoints, not agent-facing.
  • [x] Master Record: activation-plan reference in docs/plans/2026-05-14-activation-plan.md.