BR Tier 1 — god-mode audit table + CAF upsert-key cryptographic anchor
2026-04-22
What We Built
Three BR-side workstreams (BR-1, BR-2, BR-3) completing Tier 1 of the cross-project enrollment-idempotence RFC (~/.brainstorm/rfcs/2026-04-enrollment-idempotence.md, ratified v0.7).
- BR-1 —
god_mode_executionsaudit table (v49). Every/v1/god-mode/executeinvocation writes one synchronous row before the HTTP response returns, capturing trace_id, tenant, caller api_key_id, source_ip, tool, params, risk_level, result, error_code, error, executed_at, duration_ms. Closes the audit gap where thetrace_idreturned to clients was not persisted anywhere. - BR-2 — CAF
AgentServiceRegistryupsert key switched to(tenant_id, spiffe_id)(v50). Addsmachine_id_fingerprintcolumn as a trust-ladder signal and aux_agent_registry_tenant_spiffeUNIQUE INDEX — the first DB-level uniqueness constraint onagent_registry(the table previously had only non-unique B-trees; the hostname-based "upsert" was app-level only, exposing a hostname-takeover race). Emitsidentity.enrollment_successon every register andidentity.fingerprint_driftwhen a cert's prior fingerprint differs from the new one. Bundles a heartbeat DB-fallback: when the local cache misses (node restart, Redis pub/sub gap),heartbeat()now queries Postgres and hydrates the cache instead of silently returningfalse. - BR-3 — documents the 30-second API-key revocation propagation lag in
ops-runbooks.md, including the manual Redis cache-invalidation recipe for sub-30s incident-response cutoff.
Why It Matters
Tier 1 closes three correctness gaps that don't show up in production metrics because no one was looking at the right surface:
- God-mode calls were untraceable post-hoc. The
trace_idBR returned in responses was cosmetic — nothing stored it. Forensics on a mutation ("who revoked which key at 14:07 UTC?") required parsing application logs, which are ephemeral. Now every god-mode call has a queryable audit row. agent_registryallowed hostname takeover. Since the idempotency check was app-level (read-then-update-or-insert) and not DB-enforced, a concurrent registration race with matching hostnames could produce duplicate rows or overwrite an existing agent's cryptographic anchor. Switching to(tenant_id, spiffe_id)upsert anchors identity in the mTLS cert SAN — something the caller can't forge.- API-key revocation had an undocumented 30s lag. Operators rotating keys in normal flow are unaffected, but incident response (leaked key with active abuse) needed a manual Redis-flush recipe that wasn't written down anywhere.
How It Works
BR-1 — god-mode audit:
// src/api/routes/god-mode.ts
const traceId = crypto.randomUUID().slice(0, 16); // generated up-front, not post-hoc
const apiKey = c.get("apiKey");
const sourceIp = c.req.header("cf-connecting-ip") ?? c.req.header("x-forwarded-for") ?? null;
const startedAt = performance.now();
// ... run executor, capture result or error ...
const durationMs = Math.max(0, Math.round(performance.now() - startedAt));
await getDb()
.insert(godModeExecutions)
.values({
traceId,
tenantId,
apiKeyId: apiKey?.id ?? null,
sourceIp,
tool: toolName,
params,
riskLevel: toolDef.risk_level,
result,
errorCode,
error: errorMessage,
durationMs,
});
Audit failures are logged but never fail the user call — audit is advisory, not authoritative. api_key_id is text, not uuid, because mTLS produces a synthetic key id like mtls:.
BR-2 — CAF upsert:
// src/security/caf/agent-registry.ts — register()
// Lookup by SPIFFE ID, not hostname:
const existing = await tx.select().from(agentRegistry)
.where(and(eq(agentRegistry.tenantId, opts.tenantId),
eq(agentRegistry.spiffeId, opts.spiffeId)))
.limit(1);
// On update: refresh hostname (machine may have been legitimately renamed),
// and compare machine_id_fingerprint for drift detection.
if (existing.length > 0 && opts.machineIdFingerprint &&
priorFingerprint && priorFingerprint !== opts.machineIdFingerprint) {
emitIdentityEvent({ eventType: "identity.fingerprint_drift", severity: "warn", ... });
}
Events route through the existing security_events store (same bus as auth.mtls_success / auth.mtls_failure), matching the RFC §5.5 cross-product taxonomy so SIEM queries can join across products.
BR-2 — heartbeat DB-fallback:
// Before: cache miss → silent `return false`
// After: cache miss → DB lookup by (tenantId, hostname) → hydrate cache → complete heartbeat
BR-3 — doc: added a "API Key Revocation — Known Propagation Lag" section under Secret Rotation in ops-runbooks.md, with the manual Redis invalidation recipe.
The Numbers
- 2 new schema versions (v49, v50) — deploy-independent of any other BR work.
- 1 new DB-level unique constraint on
agent_registry(was 0). - 1 new identity event family (
identity.*) on the existingsecurity_eventsstore — no new bus, no new wiring. - Audit write overhead per god-mode call: one synchronous INSERT, measured in microseconds against a warm connection pool. God-mode call throughput is low enough that this is not a hot path.
/v1/mesh/registertraffic today: zero production callers (pre-emptive correctness fix; RFC §6 Phase 2b — BR-4 SDK methods will wire real callers in the next sprint).
Competitive Edge
BR is now the only ecosystem-comparable gateway with a cryptographically-anchored agent registry: (tenant, SPIFFE ID) is the DB-enforced identity. Portkey, Helicone, Letta, and Vercel AI Gateway all treat agent identity as either a client-asserted hostname/label or a simple api_key. The CAF mesh (mTLS + CA + revocation + registry upsert on crypto identity) is a zero-trust primitive none of them ship natively.
Lockstep Checklist
- [x] API Routes: no new routes, no contract changes (
/v1/god-mode/execute,/v1/mesh/register,/v1/mesh/heartbeatJSON contracts unchanged). - [x] TS SDK: not applicable — no new routes or fields exposed. BR-4 (next sprint) adds
mesh.register()/heartbeat()/deregister()as a separate workstream. - [x] Python SDK: not applicable — same reason.
- [x] MCP Schemas: not applicable.
- [x] Ship Log: this file.
Related
- RFC:
~/.brainstorm/rfcs/2026-04-enrollment-idempotence.md(v0.7, cross-ratified) - Coordinated with: brainstorm CLI peer (6sv3av7w), BrainstormMSP peer (m6uoiyb2), brainstorm-agent peer (a3br7vkd)
- Upcoming: BR-4 (TS + Python SDK mesh methods — prerequisite for any external mesh enrollment client), BR-5 (optional MCP tool wrapper, deferred)