ZTNA verifiability: build attestation, federated identity, reputation-coupled certs
2026-04-18
What We Built
Phase 1A — Build-time attestation via Sigstore. Every production deploy now signs its Docker image via cosign with keyless GitHub OIDC, publishes the signature to the Rekor transparency log, and injects the verification metadata into the ECS task definition. GET /attestation (public, no auth) returns the bundle any customer can use to independently verify the running image matches the exact git commit they audited. Every response also carries an X-BR-Build: header so callers detect deploy changes mid-session.
Phase 1B — Federated agent identity. BR now verifies agent JWTs issued by external IDPs (Okta, Auth0, Microsoft Entra, any OIDC-compliant provider) instead of minting them itself. Every agent profile carries an identity_provenance field ("internal" or "federated:) for SOC 2 audit trails. When a tenant enables allow_agent_federation=true on their tenant_idp_configs row, BR's internal /v1/agent/bootstrap minter gates off — external JWTs become the sole identity source. The migration path is incremental: tenants can add federation without forcing a cutover.
Phase 3 — Marketing honesty pass. Landing page "Replaces: Zscaler ZTNA" claim (which was comparing application-layer memory isolation to a network-layer product) replaced with accurate descriptive framing. Added a fourth layer card for Build Attestation. Manifesto gained two new "Live since April 2026" shipped entries.
Phase 2B — Continuous verification (reputation-coupled cert lifetime). CAF certificate TTL is now a function of agent reputation tier: platinum agents get 10-min certs, gold 7 min, silver 5 min (default), bronze 3 min, restricted 1 min. Short cert lifetime IS the revocation mechanism — an agent whose reputation drops mid-session stops getting reissued long-lived certs, and the current one expires. No separate CRL needed. Response shape gained reputation_tier and validity_sec so callers can schedule refresh proactively. The mapping is pinned by tests that enforce strict monotonic ordering (platinum > gold > silver > bronze > restricted) and a 60s floor (no zero-lifetime lockouts).
Bonus — Test-guarded 2/3 integration gap classes. Three separate classes of "silent 404 / silent skip" bugs surfaced this session during post-deploy verification:
- api-mount allowlist drift —
/attestationand/v1/agent/federated-bootstrapeach 404'd in prod because the gateway's hardcoded allowlist didn't include them. Now guarded:src/api/route-manifest-coverage.test.tsreadssite/routes.jsonand verifies every route is covered by anapi-mount.tsallowlist entry. CURRENT_VERSIONvs migration blocks drift — migrations v47/v48 silently skipped in prod because the top-level guard constant wasn't bumped. Now guarded:src/db/migrate.test.tsparsesmigrate.ts, extractsCURRENT_VERSION, and asserts it equalsmax(if (currentVersion < N)).- Pre-auth vs post-auth registration drift —
/v1/agent/federated-bootstrapwas intercepted by the API key middleware chain because I registered it in the wrong function. Still un-guarded; flagged as follow-up (harder to test automatically — requires deriving "doesn't require API key" from route annotations or middleware introspection).
Why It Matters
Two of the most common enterprise security objections to proxy-based AI gateways reduce to the same root cause: you're asking customers to trust claims they can't verify.
- "Why route through your proxy if we can't prove you don't see our data?" → Without attestation, every ZDR or E2E claim is trust-us marketing.
- "Why give you the power to mint agent identities for our tenant?" → Without federated identity, BR has a credential that could issue JWTs for any agent in any tenant. That's a SOC 2 finding.
Phase 1A and 1B remove both objections with cryptographic primitives the customer verifies themselves, not contractual assurances. The pair is the verifiability foundation for the entire ZTNA plan — Phases 2 (policy-as-code via Cedar, continuous verification) and 4 (runtime TEE attestation) build on this.
How It Works
Phase 1A: attestation
Deploy-time in .github/workflows/ecs-deploy.yml:
uses: sigstore/cosign-installer@v3
- name: Sign image via Sigstore (keyless, GitHub OIDC → Rekor)
run: cosign sign --yes "${IMAGE_URI_DIGEST}"
Cosign exchanges the GitHub Actions OIDC token for a short-lived Fulcio certificate, signs the image digest, and publishes a transparency log entry. No long-lived signing key lives in the repo. The ECS task definition gets BR_BUILD_* env vars injected so the running container can expose its own attestation.
Runtime verification:
curl https://api.brainstormrouter.com/attestation | jq .
# returns {status, commit, image, signature, rekor_search_url, verify.command}
# The verify.command in the response runs independently:
cosign verify 465853823186.dkr.ecr.us-east-1.amazonaws.com/brainstorm-router@sha256:eef8d1b... \
--certificate-identity-regexp='^https://github\.com/justinjilg/brainstormrouter/\.github/workflows/ecs-deploy\.yml@refs/heads/main' \
--certificate-oidc-issuer='https://token.actions.githubusercontent.com'
Phase 1B: federated identity
Flow of POST /v1/agent/federated-bootstrap:
- Extract external JWT from
Authorization: Bearer, tenant fromX-BR-Tenant-Idheader. - Parse JWT (unverified) to extract
issclaim. - Look up
tenant_idp_configsby (tenantId, issuer). Reject ifenabled=falseorallow_agent_federation=false. - Cryptographically verify via
jose.jwtVerifyagainst the IDP's JWKS. - Extract agent_id from
sub(or configurableclaim_mapping.agentId). - Create/update
agent_profilesrow withidentity_provenance="federated:." - Issue BR-signed session JWT. The session is a session artifact, not an identity claim — BR cannot issue NEW federated identities without an external JWT to anchor them.
Tenant gate (1B.4): isAgentFederationEnabled(tenantId) runs at the top of /v1/agent/bootstrap. When any enabled IDP has allow_agent_federation=true, internal minting returns 403 federation_required.
The Numbers
- 2 API endpoints added:
GET /attestation,POST /v1/agent/federated-bootstrap - 5 SDK methods across TS + Python (sync + async variants)
- 2 migration bumps: v47 (agent_profiles.identity_provenance), v48 (tenant_idp_configs.allow_agent_federation)
- 19 tests added: 6 for attestation bundle assembly, 9 for federated bootstrap, 1 for the gate, 3 fixture updates
- 7,321 total tests passing across the repo
- 10 commits shipped in one session, all deployed to production ECS Fargate via the standard deploy pipeline
Competitive Edge
Most AI gateway vendors claim "zero data retention" or "end-to-end encryption" without a way for customers to verify the claim. Portkey, Helicone, LangSmith, OpenRouter — all have ZDR marketing, none offer attestation. Auth0, Okta, and Entra integrate with web apps but not with agent identity lifecycles.
What ships today:
- Sigstore attestation on every deploy — customers run
cosign verifyagainst the actual running image and see the exact commit. Not "trust our security page." Cryptographic proof. - Federated agent identity — "BR cannot mint agent identities for our tenant" is enforceable at the code level when the tenant flips
allow_agent_federation=true. The internal JWT minter literally returns 403. - Identity provenance audit trail — every agent profile carries a queryable
identity_provenancefield. SOC 2 auditors can run one SQL query and see which agents came from the customer's IDP vs BR's internal flow.
The verifiability primitive is what unlocks the next enterprise conversation. Without it, everything else is promises.
Lessons From Today's Integration Gaps
Four integration gaps caught post-deploy that unit tests couldn't catch:
- api-mount allowlist —
/attestation404'd before Hono saw it because the gateway'sapi-mount.tshas a hardcoded whitelist of path prefixes. - Pre-auth vs post-auth registration —
/v1/agent/federated-bootstrapwas intercepted byapiKeyAuth()middleware because I registered it inregisterCommonRoutesinstead ofregisterPreAuthRoutes. CURRENT_VERSIONguard — v47/v48 migrations never ran because the top-levelCURRENT_VERSION = 46constant early-exited the migration loop before reaching the new blocks.docker inspectvsaws ecr describe-images— the deploy IAM role couldn't call DescribeImages (principle of least privilege), so the cosign sign step failed on the first try.
The common shape: six hidden contracts for "what paths are valid" across different middleware layers (api-mount, pre-auth vs post-auth, CURRENT_VERSION, BUDGET_EXEMPT_PREFIXES, auth.\_authMethod gates, gateway-level health interceptor). None are derived from the canonical route list in site/routes.json — every one is maintained by hand.
Future work flagged (not in this session): derive all six from site/routes.json as a build-time step. Same direction as the Tier 2 contract-diff hook we shipped yesterday, extended to middleware visibility. Would eliminate this entire class of "new route, silent 404" bugs.
Lockstep Checklist
- [x] API Routes:
src/api/routes/attestation.ts,src/api/routes/agent-federated-bootstrap.tsadded.src/api/routes/agent-bootstrap.tsgated on federation. - [x] TS SDK:
packages/sdk-ts/src/resources/attestation.ts+agent-profiles.tsextended. - [x] Python SDK:
packages/sdk-py/src/brainstormrouter/resources/attestation.py+agent_profiles.pyextended (sync + async). - [x] MCP Schemas: N/A — both endpoints are security primitives exposed via SDK, not agent-facing tools.
- [x] Docs:
docs/public/security/verify-build.mdxanddocs/public/security/federated-identity.mdxpublished. - [x] Landing page: four-layer card revised, Layer 04 (attestation) added, Layer 02 tag fixed.
- [x] Manifesto: two "Live since April 2026" entries added under "What We've Built".
- [x] Tests: 19 added, 7,321 passing.
- [x] Contract manifest:
site/routes.jsonregenerated, Tier 2 hook passed.
Related
- Phase 2A (Cedar policy-as-code) — queued, 2-3 week scope
- Phase 2B (continuous verification via reputation → cert lifetime) — queued, 1 week scope
- Phase 4 (runtime TEE attestation) — contract-gated, not scoped
- Prior session (2026-04-17): contract-diff lockstep hook via
site/routes.json— the infrastructure this session relied on to enforce surface/SDK alignment