Graduated Trust Degradation — Continuous Agent Governance
2026-03-07
What We Built
Graduated trust degradation replaces the binary "normal or killed" agent anomaly response with a continuous governance spectrum. When the anomaly engine detects suspicious behavior, agents are now degraded proportionally: medium-severity anomalies force cost-optimized routing, high-severity adds model cost caps, and critical still triggers the kill switch. Agents automatically recover after 3 consecutive clean verdicts.
Three capabilities ship together: (1) graduated trust enforcement in the model router, forcing price-optimized routing and cost-capping restricted agents to below-median-cost models; (2) manifest-before-activation, a fail-closed governance boundary requiring a valid governance manifest before any agent can become active; (3) persistent anomaly baselines that survive gateway restarts, eliminating the warm-up blind spot where anomaly detection has no learned baselines.
Why It Matters
Enterprises deploying AI agent fleets need proportional response, not binary kill switches. A minor behavioral drift shouldn't terminate an agent mid-task — it should constrain its operating conditions (cheaper models, tighter budgets) while the system continues monitoring. This is the difference between a circuit breaker and a governance system: circuit breakers are binary, governance is continuous.
The manifest-before-activation guard ensures no agent can operate without declared governance boundaries. The fail-closed design means even if the manifest store goes down, agents cannot activate — security boundaries don't disappear when dependencies fail.
How It Works
Trust level mapping:
severity "none"/"low" → trustLevel "full" → normal routing
severity "medium" → trustLevel "degraded" → force strategy = "price"
severity "high" → trustLevel "restricted" → price + below-median cost cap
severity "critical" → kill switch (existing)
Trust updates are monotonic — a concurrent medium-severity event cannot relax a previously restricted agent. The moreRestrictiveTrust() function uses numeric precedence to ensure the most restrictive level always wins.
Cost cap mechanism: For restricted agents, the router computes the median cost across all eligible endpoints, then filters out above-median entries. This avoids hardcoding "premium" vs "economy" tiers — it adapts to whatever models are in the registry.
Trust recovery: After 3 consecutive "none" or "low" severity verdicts, trust automatically resets to "full". The counter resets to 0 on any degradation event.
Anomaly baseline persistence: MetricWindow snapshots (rolling mean, variance, sigma, observation values) are serialized to a new anomaly_baselines Postgres table with composite PK (namespace, entity_key). Separate namespaces (completions, mesh) prevent cross-contamination between the two independent anomaly detector instances. Baselines save every 60 seconds and on graceful shutdown, restore on startup.
The Numbers
- 3 trust levels with proportional routing enforcement
- 3-verdict recovery threshold (configurable via metadata)
- 60-second baseline persistence interval
- 0 warm-up blind spots after restart (baselines restored from Postgres)
- Sub-millisecond trust lookup (O(1) cache read from AgentProfileStore)
- 14 regression tests exercising real
AgentProfileStore.transitionLifecycleandModelRouter.resolveEndpointcode paths
Competitive Edge
No competitor in the AI agent security space offers graduated runtime response tied to routing economics. Lasso Security detects intent anomalies but can only block or allow. Aembit manages identity but has no runtime enforcement path. Portkey routes but doesn't degrade. BrainstormRouter uniquely combines anomaly detection → trust scoring → routing enforcement → economic constraints → automatic recovery in a single closed loop. The agent keeps working, but under tighter constraints — and the system continuously decides whether to tighten or relax.
Lockstep Checklist
> _You MUST check these boxes [x] and verify the corresponding files are updated BEFORE committing this log._
- [x] API Routes:
src/api/routes/completions.tsupdated (internal wiring only — no API surface change, no new endpoints or request/response schema changes). - [x] TS SDK:
packages/sdk-tsreviewed — no SDK method changes needed (trust degradation is transparent to API consumers). - [x] Python SDK:
packages/sdk-pyreviewed — no SDK method changes needed. - [x] MCP Schemas: No new MCP tools needed (internal routing behavior).
- [ ] Master Record:
docs/architecture/master-capability-record.md— deferred (file does not exist yet).