ARM: Agent Resource Management — The Workday for AI Agents

2026-03-04

agent-profilesagent-budget-managervirtual-keysanomaly-kill-switch

LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: [ "GET /v1/agent/profiles", "POST /v1/agent/profiles", "GET /v1/agent/profiles/me", "GET /v1/agent/profiles/:agentId", "PATCH /v1/agent/profiles/:agentId", "PATCH /v1/agent/profiles/lifecycle/:agentId", "DELETE /v1/agent/profiles/:agentId", ] sdk_methods_updated: [ "client.agentProfiles.list()", "client.agentProfiles.create()", "client.agentProfiles.get()", "client.agentProfiles.me()", "client.agentProfiles.update()", "client.agentProfiles.transition()", "client.agentProfiles.delete()", ] mcp_tools_updated: ["none"] ---

What We Built

Agent Resource Management (ARM) transforms AI agents from anonymous API consumers into managed enterprise resources with identities, budgets, lifecycle states, and human ownership. Think of it as "Workday for AI agents" — every agent has a profile, a cost center, a daily and monthly budget, an RBAC role, and a lifecycle state machine that moves through provisioned → active → quarantined → suspended → terminated.

The system has three layers:

Agent Profiles (agent-profiles.ts + agent-profile-store.ts): Full CRUD API for agent identity. Each profile includes a display name, human owner ID, cost center, budget limits (daily and monthly USD), RBAC role, lifecycle state, and extensible metadata. Storage uses a 3-tier architecture: Postgres as source of truth, Redis pub/sub for cross-instance cache invalidation, and a local Map for O(1) hot-path lookups.

Agent Budget Manager (agent-budget-manager.ts): Per-agent, per-conversation cost budgeting with multi-dimensional limits. Pre-flight budget checks return an allowed boolean, remaining budget, whether to downgrade the model, and a recommended fallback model. The system auto-downgrades when the agent is in the final 20% of its budget and generates an efficiency leaderboard ranking agents by (successRate * 100) / avgCostPerRequest.

Kill Switch Integration: When the Hot-Path Anomaly Engine detects a compromised agent, the kill switch automatically transitions the agent's lifecycle from active to quarantined — stripping write-access tools while preserving read-only capabilities.

Why It Matters

Enterprises deploying 50+ AI agents need the same governance they have for human employees: who owns this agent, what's its budget, what happens when it misbehaves? Without ARM, agents are anonymous API keys with no accountability, no spending limits, and no organizational structure.

ARM answers the questions CISOs and CFOs ask: "Which agent spent $2,000 last month? Who approved it? Can we quarantine it without killing the whole service? What's our most cost-efficient agent?"

How It Works

Lifecycle State Machine:

provisioned → active → (quarantined ↔ active) → suspended → terminated
  • provisioned: Identity minted, awaiting first heartbeat
  • active: Normal A2A routing and tool execution
  • quarantined: Anomaly detected — write-access tools stripped, read-only preserved
  • suspended: Budget exhausted — returns 429 until human approves increase
  • terminated: Kill switch engaged — certificate globally revoked

Budget Pre-Flight Check:

const check = budgetManager.check(agentId, conversationId, estimatedCostUsd);
// Returns: { allowed, remainingUsd, shouldDowngrade, recommendedModel, reason }

When an agent is in the final 20% of its budget, the budget manager signals shouldDowngrade: true with a recommended cheaper model. When budget is exhausted, the action depends on configuration: "downgrade" (use cheapest available model) or "stop" (reject with 429).

3-Tier Storage:

Write path:  DB upsert → local Map update → Redis PUBLISH
Read path:   local Map lookup (O(1), never touches DB or Redis)
Fallback:    If Redis subscribe fails → poll every 30 seconds

Efficiency Leaderboard:

const leaderboard = budgetManager.leaderboard();
// Each entry: { agentId, totalRequests, totalCostUsd, successRate,
//               avgCostPerRequest, avgLatencyMs, efficiencyScore }
// efficiencyScore = (successRate * 100) / avgCostPerRequest

The Numbers

  • 5 lifecycle states with auditable transitions
  • 3 budget dimensions: per-conversation, per-hour, per-day (USD)
  • O(1) hot-path lookup via local Map — profile reads never touch the database
  • 20% budget threshold triggers automatic model downgrade
  • 30-second polling fallback if Redis pub/sub disconnects
  • 10,000-record spending history per agent (FIFO eviction)

Competitive Edge

No AI gateway offers agent-level resource management. Portkey has workspace-level API keys. OpenRouter has per-key rate limits. Neither provides lifecycle state machines, human ownership, cost center assignment, budget-aware model downgrade, or efficiency leaderboards. BrainstormRouter's ARM is the only system where an AI agent has a complete HR record — identity, owner, budget, performance metrics, and a kill switch — making it the foundation for enterprise agent governance at scale.

Lockstep Checklist

  • [x] API Routes: Full CRUD at /v1/agent/profiles/* with RBAC enforcement (admin/operator for management, agent role for self-read).
  • [x] TS SDK: client.agentProfiles.list(), .create(), .get(), .me(), .update(), .transition(), .delete() — all implemented in packages/sdk-ts/src/resources/agent-profiles.ts.
  • [x] Python SDK: Matching sync + async interfaces in packages/sdk-py/src/brainstormrouter/resources/agent_profiles.py.
  • [ ] MCP Schemas: Not applicable.
  • [x] Master Record: Listed under "ARM Phase 3.5" in master-capability-record.md.