BR Tier 2 — SDK mesh enrollment + hostname-collision handling
2026-04-22
What We Built
Tier 2 completes the BR side of the cross-project enrollment-idempotence RFC by closing a pre-existing lockstep drift and resolving a hostname-collision ambiguity surfaced during Tier 1.
- BR-4 (TS + Python SDK mesh methods) —
/v1/mesh/register,/v1/mesh/heartbeat,/v1/mesh/deregisternow have first-class SDK methods in both TypeScript (client.mesh.register(...)) and Python (syncMesh+ asyncAsyncMesh). Before this ship, the endpoints existed but had no structured callers anywhere in the codebase — zero TS methods, zero Python methods, zero CLI tools. BR-4 closes that gap and unblocks any downstream product (brainstorm-agent, platform-gold forks, external integrations) that wants to enroll into the CAF mesh. - BR-6 (hostname-collision semantics) — Post-Tier 1,
agent_registryallows two agents with the same hostname but distinct SPIFFE IDs (the unique constraint is(tenant_id, spiffe_id), not hostname). The heartbeat DB-fallback added in Tier 1 used.limit(1)on a non-unique hostname query, which is deterministic-but-arbitrary when collision occurs. Tier 2 adds explicitORDER BY last_heartbeat DESC LIMIT 1to that fallback — collision resolution is now "most-recent-heartbeat wins addressing," which matches cache behavior (last writer wins) and maps to user intent. - Server schema extension —
MeshRegisterSchemainsrc/api/routes/mesh.tsnow acceptsmachine_id_fingerprint, and the handler threads it through toAgentServiceRegistry.register(). This was the missing wire-up: BR-2 added the column + event emission, but no production caller could supply a fingerprint until the route accepted the field.
Why It Matters
Tier 1 made the DB correct. Tier 2 makes it reachable:
- A lockstep-drift finding that had been sitting pre-existing ("BR publishes API routes without SDK methods") is now resolved for the mesh family. The convention is honored; future audits won't flag it.
- The fingerprint-drift anomaly path (BR-2) is end-to-end live: SDK can submit a fingerprint, server stores it, second enrollment with a different fingerprint emits
identity.fingerprint_drifttosecurity_events, SIEM queries can correlate with MSP'sidentity.*events under the RFC §5.5 taxonomy. - Hostname collision semantics are no longer a latent trap for external SDK callers. Whichever agent most recently heartbeated is "the" holder of a given hostname, deterministically.
How It Works
TS SDK — register with optional fingerprint:
import BrainstormRouter from "brainstormrouter";
import { machineIdSync } from "node-machine-id"; // user's choice, not an SDK dep
import crypto from "node:crypto";
import os from "node:os";
const client = new BrainstormRouter({ apiKey: process.env.BR_API_KEY });
const fingerprint = crypto
.createHash("sha256")
.update(`${machineIdSync()}|${os.hostname()}`)
.digest("hex");
const { agent } = await client.mesh.register({
hostname: "worker-03",
endpoint: "https://worker-03.internal:8443",
capabilities: ["chat", "embedding"],
machine_id_fingerprint: fingerprint,
});
// Heartbeat every ~25s; registry stale-timeout is 60s
setInterval(() => client.mesh.heartbeat({ hostname: agent.hostname }), 25_000);
Python SDK — sync and async both available:
from brainstormrouter import BrainstormRouter
from brainstormrouter.resources.mesh import AsyncMesh
client = BrainstormRouter(api_key=os.environ["BR_API_KEY"])
result = client.mesh.register(
hostname="worker-03",
endpoint="https://worker-03.internal:8443",
capabilities=["chat"],
machine_id_fingerprint=compute_fingerprint(),
)
Hostname-collision resolution:
// src/security/caf/agent-registry.ts — heartbeat DB-fallback
.where(and(eq(agentRegistry.tenantId, tenantId), eq(agentRegistry.hostname, hostname)))
.orderBy(desc(agentRegistry.lastHeartbeat))
.limit(1);
Ownership verification still happens upstream in mesh.ts — if the most-recent agent for a hostname has a SPIFFE ID that doesn't match the caller's cert, the route returns 403 before calling heartbeat(). So collision + wrong-caller = clean 403; collision + right-caller = the agent that was most-recently-live gets its heartbeat honored.
The Numbers
- 3 new TS SDK methods (
mesh.register,mesh.heartbeat,mesh.deregister) + 3 new exported types (MeshAgentRegistration,MeshRegisterParams,MeshHostnameParams). - 6 new Python SDK methods (3 sync × 2 for async). Matches TS surface 1:1.
- 11 new SDK tests (4 TS + 7 Python). Coverage includes register-with and register-without fingerprint, heartbeat, deregister, async variants.
- 7345 total unit tests pass (was 7341 after Tier 1; +4 TS mesh tests).
- 0 new database migrations — Tier 2 is purely code-level (server schema + handler threading + SDK wrapping + DB query ordering).
Competitive Edge
Combined with Tier 1, BR now has the only ecosystem-visible zero-trust agent enrollment path that ships with SDKs in both languages. Portkey, Helicone, and Vercel AI Gateway have no equivalent "agent enrolls with a cert and gets a tenant-anchored identity" primitive at all. Letta's agent framework binds identity to a session token, not a cert.
Lockstep Checklist
- [x] API Routes:
src/api/routes/mesh.ts—MeshRegisterSchemaacceptsmachine_id_fingerprint. Contract is backward-compatible (field is optional). - [x] TS SDK:
packages/sdk-ts/src/resources/mesh.ts—register(),heartbeat(),deregister()methods + type exports indomains/agents.tsandindex.ts. 4 new tests. - [x] Python SDK:
packages/sdk-py/src/brainstormrouter/resources/mesh.py—MeshandAsyncMesheach gain register/heartbeat/deregister. 7 new tests. - [x] MCP Schemas: BR-5 (
br.mesh_registerMCP tool wrapper) remains deferred per RFC §6 Phase 2 (optional, not blocking external adoption). - [x] Ship Log: this file.
Related
- Tier 1 ship log:
docs/ship-log/2026-04-22-br-tier1-audit-enrollment.md - RFC:
~/.brainstorm/rfcs/2026-04-enrollment-idempotence.md(v0.7, ratified) - Coordinated with: brainstorm CLI peer (6sv3av7w), BrainstormMSP peer (m6uoiyb2), brainstorm-agent peer (a3br7vkd)
- Remaining BR backlog: BR-5 (optional MCP tool wrapper, deferred until concrete adoption need)