BR Tier 2 — SDK mesh enrollment + hostname-collision handling

2026-04-22

cafsdk-tssdk-pymesh

What We Built

Tier 2 completes the BR side of the cross-project enrollment-idempotence RFC by closing a pre-existing lockstep drift and resolving a hostname-collision ambiguity surfaced during Tier 1.

  • BR-4 (TS + Python SDK mesh methods)/v1/mesh/register, /v1/mesh/heartbeat, /v1/mesh/deregister now have first-class SDK methods in both TypeScript (client.mesh.register(...)) and Python (sync Mesh + async AsyncMesh). Before this ship, the endpoints existed but had no structured callers anywhere in the codebase — zero TS methods, zero Python methods, zero CLI tools. BR-4 closes that gap and unblocks any downstream product (brainstorm-agent, platform-gold forks, external integrations) that wants to enroll into the CAF mesh.
  • BR-6 (hostname-collision semantics) — Post-Tier 1, agent_registry allows two agents with the same hostname but distinct SPIFFE IDs (the unique constraint is (tenant_id, spiffe_id), not hostname). The heartbeat DB-fallback added in Tier 1 used .limit(1) on a non-unique hostname query, which is deterministic-but-arbitrary when collision occurs. Tier 2 adds explicit ORDER BY last_heartbeat DESC LIMIT 1 to that fallback — collision resolution is now "most-recent-heartbeat wins addressing," which matches cache behavior (last writer wins) and maps to user intent.
  • Server schema extensionMeshRegisterSchema in src/api/routes/mesh.ts now accepts machine_id_fingerprint, and the handler threads it through to AgentServiceRegistry.register(). This was the missing wire-up: BR-2 added the column + event emission, but no production caller could supply a fingerprint until the route accepted the field.

Why It Matters

Tier 1 made the DB correct. Tier 2 makes it reachable:

  • A lockstep-drift finding that had been sitting pre-existing ("BR publishes API routes without SDK methods") is now resolved for the mesh family. The convention is honored; future audits won't flag it.
  • The fingerprint-drift anomaly path (BR-2) is end-to-end live: SDK can submit a fingerprint, server stores it, second enrollment with a different fingerprint emits identity.fingerprint_drift to security_events, SIEM queries can correlate with MSP's identity.* events under the RFC §5.5 taxonomy.
  • Hostname collision semantics are no longer a latent trap for external SDK callers. Whichever agent most recently heartbeated is "the" holder of a given hostname, deterministically.

How It Works

TS SDK — register with optional fingerprint:

import BrainstormRouter from "brainstormrouter";
import { machineIdSync } from "node-machine-id"; // user's choice, not an SDK dep
import crypto from "node:crypto";
import os from "node:os";

const client = new BrainstormRouter({ apiKey: process.env.BR_API_KEY });

const fingerprint = crypto
  .createHash("sha256")
  .update(`${machineIdSync()}|${os.hostname()}`)
  .digest("hex");

const { agent } = await client.mesh.register({
  hostname: "worker-03",
  endpoint: "https://worker-03.internal:8443",
  capabilities: ["chat", "embedding"],
  machine_id_fingerprint: fingerprint,
});

// Heartbeat every ~25s; registry stale-timeout is 60s
setInterval(() => client.mesh.heartbeat({ hostname: agent.hostname }), 25_000);

Python SDK — sync and async both available:

from brainstormrouter import BrainstormRouter
from brainstormrouter.resources.mesh import AsyncMesh

client = BrainstormRouter(api_key=os.environ["BR_API_KEY"])

result = client.mesh.register(
    hostname="worker-03",
    endpoint="https://worker-03.internal:8443",
    capabilities=["chat"],
    machine_id_fingerprint=compute_fingerprint(),
)

Hostname-collision resolution:

// src/security/caf/agent-registry.ts — heartbeat DB-fallback
.where(and(eq(agentRegistry.tenantId, tenantId), eq(agentRegistry.hostname, hostname)))
.orderBy(desc(agentRegistry.lastHeartbeat))
.limit(1);

Ownership verification still happens upstream in mesh.ts — if the most-recent agent for a hostname has a SPIFFE ID that doesn't match the caller's cert, the route returns 403 before calling heartbeat(). So collision + wrong-caller = clean 403; collision + right-caller = the agent that was most-recently-live gets its heartbeat honored.

The Numbers

  • 3 new TS SDK methods (mesh.register, mesh.heartbeat, mesh.deregister) + 3 new exported types (MeshAgentRegistration, MeshRegisterParams, MeshHostnameParams).
  • 6 new Python SDK methods (3 sync × 2 for async). Matches TS surface 1:1.
  • 11 new SDK tests (4 TS + 7 Python). Coverage includes register-with and register-without fingerprint, heartbeat, deregister, async variants.
  • 7345 total unit tests pass (was 7341 after Tier 1; +4 TS mesh tests).
  • 0 new database migrations — Tier 2 is purely code-level (server schema + handler threading + SDK wrapping + DB query ordering).

Competitive Edge

Combined with Tier 1, BR now has the only ecosystem-visible zero-trust agent enrollment path that ships with SDKs in both languages. Portkey, Helicone, and Vercel AI Gateway have no equivalent "agent enrolls with a cert and gets a tenant-anchored identity" primitive at all. Letta's agent framework binds identity to a session token, not a cert.

Lockstep Checklist

  • [x] API Routes: src/api/routes/mesh.tsMeshRegisterSchema accepts machine_id_fingerprint. Contract is backward-compatible (field is optional).
  • [x] TS SDK: packages/sdk-ts/src/resources/mesh.tsregister(), heartbeat(), deregister() methods + type exports in domains/agents.ts and index.ts. 4 new tests.
  • [x] Python SDK: packages/sdk-py/src/brainstormrouter/resources/mesh.pyMesh and AsyncMesh each gain register/heartbeat/deregister. 7 new tests.
  • [x] MCP Schemas: BR-5 (br.mesh_register MCP tool wrapper) remains deferred per RFC §6 Phase 2 (optional, not blocking external adoption).
  • [x] Ship Log: this file.

Related

  • Tier 1 ship log: docs/ship-log/2026-04-22-br-tier1-audit-enrollment.md
  • RFC: ~/.brainstorm/rfcs/2026-04-enrollment-idempotence.md (v0.7, ratified)
  • Coordinated with: brainstorm CLI peer (6sv3av7w), BrainstormMSP peer (m6uoiyb2), brainstorm-agent peer (a3br7vkd)
  • Remaining BR backlog: BR-5 (optional MCP tool wrapper, deferred until concrete adoption need)