ZTNA verifiability: build attestation, federated identity, reputation-coupled certs

2026-04-18

cafattestationagent-identitycontinuous-verificationsdk-tssdk-pydocs

What We Built

Phase 1A — Build-time attestation via Sigstore. Every production deploy now signs its Docker image via cosign with keyless GitHub OIDC, publishes the signature to the Rekor transparency log, and injects the verification metadata into the ECS task definition. GET /attestation (public, no auth) returns the bundle any customer can use to independently verify the running image matches the exact git commit they audited. Every response also carries an X-BR-Build: header so callers detect deploy changes mid-session.

Phase 1B — Federated agent identity. BR now verifies agent JWTs issued by external IDPs (Okta, Auth0, Microsoft Entra, any OIDC-compliant provider) instead of minting them itself. Every agent profile carries an identity_provenance field ("internal" or "federated:") for SOC 2 audit trails. When a tenant enables allow_agent_federation=true on their tenant_idp_configs row, BR's internal /v1/agent/bootstrap minter gates off — external JWTs become the sole identity source. The migration path is incremental: tenants can add federation without forcing a cutover.

Phase 3 — Marketing honesty pass. Landing page "Replaces: Zscaler ZTNA" claim (which was comparing application-layer memory isolation to a network-layer product) replaced with accurate descriptive framing. Added a fourth layer card for Build Attestation. Manifesto gained two new "Live since April 2026" shipped entries.

Phase 2B — Continuous verification (reputation-coupled cert lifetime). CAF certificate TTL is now a function of agent reputation tier: platinum agents get 10-min certs, gold 7 min, silver 5 min (default), bronze 3 min, restricted 1 min. Short cert lifetime IS the revocation mechanism — an agent whose reputation drops mid-session stops getting reissued long-lived certs, and the current one expires. No separate CRL needed. Response shape gained reputation_tier and validity_sec so callers can schedule refresh proactively. The mapping is pinned by tests that enforce strict monotonic ordering (platinum > gold > silver > bronze > restricted) and a 60s floor (no zero-lifetime lockouts).

Bonus — Test-guarded 2/3 integration gap classes. Three separate classes of "silent 404 / silent skip" bugs surfaced this session during post-deploy verification:

  1. api-mount allowlist drift/attestation and /v1/agent/federated-bootstrap each 404'd in prod because the gateway's hardcoded allowlist didn't include them. Now guarded: src/api/route-manifest-coverage.test.ts reads site/routes.json and verifies every route is covered by an api-mount.ts allowlist entry.
  2. CURRENT_VERSION vs migration blocks drift — migrations v47/v48 silently skipped in prod because the top-level guard constant wasn't bumped. Now guarded: src/db/migrate.test.ts parses migrate.ts, extracts CURRENT_VERSION, and asserts it equals max(if (currentVersion < N)).
  3. Pre-auth vs post-auth registration drift/v1/agent/federated-bootstrap was intercepted by the API key middleware chain because I registered it in the wrong function. Still un-guarded; flagged as follow-up (harder to test automatically — requires deriving "doesn't require API key" from route annotations or middleware introspection).

Why It Matters

Two of the most common enterprise security objections to proxy-based AI gateways reduce to the same root cause: you're asking customers to trust claims they can't verify.

  • "Why route through your proxy if we can't prove you don't see our data?" → Without attestation, every ZDR or E2E claim is trust-us marketing.
  • "Why give you the power to mint agent identities for our tenant?" → Without federated identity, BR has a credential that could issue JWTs for any agent in any tenant. That's a SOC 2 finding.

Phase 1A and 1B remove both objections with cryptographic primitives the customer verifies themselves, not contractual assurances. The pair is the verifiability foundation for the entire ZTNA plan — Phases 2 (policy-as-code via Cedar, continuous verification) and 4 (runtime TEE attestation) build on this.

How It Works

Phase 1A: attestation

Deploy-time in .github/workflows/ecs-deploy.yml:

uses: sigstore/cosign-installer@v3

- name: Sign image via Sigstore (keyless, GitHub OIDC → Rekor)
  run: cosign sign --yes "${IMAGE_URI_DIGEST}"

Cosign exchanges the GitHub Actions OIDC token for a short-lived Fulcio certificate, signs the image digest, and publishes a transparency log entry. No long-lived signing key lives in the repo. The ECS task definition gets BR_BUILD_* env vars injected so the running container can expose its own attestation.

Runtime verification:

curl https://api.brainstormrouter.com/attestation | jq .
# returns {status, commit, image, signature, rekor_search_url, verify.command}

# The verify.command in the response runs independently:
cosign verify 465853823186.dkr.ecr.us-east-1.amazonaws.com/brainstorm-router@sha256:eef8d1b... \
  --certificate-identity-regexp='^https://github\.com/justinjilg/brainstormrouter/\.github/workflows/ecs-deploy\.yml@refs/heads/main' \
  --certificate-oidc-issuer='https://token.actions.githubusercontent.com'

Phase 1B: federated identity

Flow of POST /v1/agent/federated-bootstrap:

  1. Extract external JWT from Authorization: Bearer, tenant from X-BR-Tenant-Id header.
  2. Parse JWT (unverified) to extract iss claim.
  3. Look up tenant_idp_configs by (tenantId, issuer). Reject if enabled=false or allow_agent_federation=false.
  4. Cryptographically verify via jose.jwtVerify against the IDP's JWKS.
  5. Extract agent_id from sub (or configurable claim_mapping.agentId).
  6. Create/update agent_profiles row with identity_provenance="federated:".
  7. Issue BR-signed session JWT. The session is a session artifact, not an identity claim — BR cannot issue NEW federated identities without an external JWT to anchor them.

Tenant gate (1B.4): isAgentFederationEnabled(tenantId) runs at the top of /v1/agent/bootstrap. When any enabled IDP has allow_agent_federation=true, internal minting returns 403 federation_required.

The Numbers

  • 2 API endpoints added: GET /attestation, POST /v1/agent/federated-bootstrap
  • 5 SDK methods across TS + Python (sync + async variants)
  • 2 migration bumps: v47 (agent_profiles.identity_provenance), v48 (tenant_idp_configs.allow_agent_federation)
  • 19 tests added: 6 for attestation bundle assembly, 9 for federated bootstrap, 1 for the gate, 3 fixture updates
  • 7,321 total tests passing across the repo
  • 10 commits shipped in one session, all deployed to production ECS Fargate via the standard deploy pipeline

Competitive Edge

Most AI gateway vendors claim "zero data retention" or "end-to-end encryption" without a way for customers to verify the claim. Portkey, Helicone, LangSmith, OpenRouter — all have ZDR marketing, none offer attestation. Auth0, Okta, and Entra integrate with web apps but not with agent identity lifecycles.

What ships today:

  • Sigstore attestation on every deploy — customers run cosign verify against the actual running image and see the exact commit. Not "trust our security page." Cryptographic proof.
  • Federated agent identity — "BR cannot mint agent identities for our tenant" is enforceable at the code level when the tenant flips allow_agent_federation=true. The internal JWT minter literally returns 403.
  • Identity provenance audit trail — every agent profile carries a queryable identity_provenance field. SOC 2 auditors can run one SQL query and see which agents came from the customer's IDP vs BR's internal flow.

The verifiability primitive is what unlocks the next enterprise conversation. Without it, everything else is promises.

Lessons From Today's Integration Gaps

Four integration gaps caught post-deploy that unit tests couldn't catch:

  1. api-mount allowlist/attestation 404'd before Hono saw it because the gateway's api-mount.ts has a hardcoded whitelist of path prefixes.
  2. Pre-auth vs post-auth registration/v1/agent/federated-bootstrap was intercepted by apiKeyAuth() middleware because I registered it in registerCommonRoutes instead of registerPreAuthRoutes.
  3. CURRENT_VERSION guard — v47/v48 migrations never ran because the top-level CURRENT_VERSION = 46 constant early-exited the migration loop before reaching the new blocks.
  4. docker inspect vs aws ecr describe-images — the deploy IAM role couldn't call DescribeImages (principle of least privilege), so the cosign sign step failed on the first try.

The common shape: six hidden contracts for "what paths are valid" across different middleware layers (api-mount, pre-auth vs post-auth, CURRENT_VERSION, BUDGET_EXEMPT_PREFIXES, auth.\_authMethod gates, gateway-level health interceptor). None are derived from the canonical route list in site/routes.json — every one is maintained by hand.

Future work flagged (not in this session): derive all six from site/routes.json as a build-time step. Same direction as the Tier 2 contract-diff hook we shipped yesterday, extended to middleware visibility. Would eliminate this entire class of "new route, silent 404" bugs.

Lockstep Checklist

  • [x] API Routes: src/api/routes/attestation.ts, src/api/routes/agent-federated-bootstrap.ts added. src/api/routes/agent-bootstrap.ts gated on federation.
  • [x] TS SDK: packages/sdk-ts/src/resources/attestation.ts + agent-profiles.ts extended.
  • [x] Python SDK: packages/sdk-py/src/brainstormrouter/resources/attestation.py + agent_profiles.py extended (sync + async).
  • [x] MCP Schemas: N/A — both endpoints are security primitives exposed via SDK, not agent-facing tools.
  • [x] Docs: docs/public/security/verify-build.mdx and docs/public/security/federated-identity.mdx published.
  • [x] Landing page: four-layer card revised, Layer 04 (attestation) added, Layer 02 tag fixed.
  • [x] Manifesto: two "Live since April 2026" entries added under "What We've Built".
  • [x] Tests: 19 added, 7,321 passing.
  • [x] Contract manifest: site/routes.json regenerated, Tier 2 hook passed.

Related

  • Phase 2A (Cedar policy-as-code) — queued, 2-3 week scope
  • Phase 2B (continuous verification via reputation → cert lifetime) — queued, 1 week scope
  • Phase 4 (runtime TEE attestation) — contract-gated, not scoped
  • Prior session (2026-04-17): contract-diff lockstep hook via site/routes.json — the infrastructure this session relied on to enforce surface/SDK alignment