Audit chain canonicalization — single source of truth fixes verify-side drift (was broken since 0d46a1728)

2026-05-09

securityapiaudit

What We Built

/v1/governance/audit/chain/verify has returned valid: false, brokenAt: 0 for every chain since 0d46a1728 ("feat(security): add hash-chained audit trail") shipped on 2026-03-06. The bug was diagnosed via live probe of api.brainstormrouter.com and confirmed by reading the canonical-payload boundary between sign-time and verify-time.

This PR introduces a single source of truth for the audit chain canonical payload — buildAuditChainPayload() in src/security/audit-chain.ts — used by both:

Sign-site (src/api/routes/completions/helpers.ts:268) when computing

eventHash for a new completion audit entry

Verify-site (src/api/routes/governance.ts:735) when reconstructing the

hash input from a DB row to validate the chain

Same function, same field selection, same type normalization. Future drift becomes a build-time error (it's a typed function signature) instead of a silently-broken chain that nobody notices because nothing reads the verify endpoint.

Why It Matters

The "tamper-evident audit trail" was the primary forensic + compliance artifact for completion-level auditing. From 2026-03-06 through this ship, it claimed to be tamper-evident but was tamper-detectable only in unit tests (src/security/audit-chain.test.ts round-trips the same in-memory object — never crosses the DB boundary). Against the production DB, every chain-verify call returned valid: false regardless of whether the entries were tampered with or not. A real adversary modifying entries would have hidden in the existing valid: false noise.

Two distinct bugs combined:

Field-set drift (16 vs 10): sign-site hashed 10 fields; verify-site

stripped only prevHash, eventHash, id, createdAt from the DB row, leaving 16 fields. The 6 extras (principalAuthMethod, principalSpiffeId, principalRoles, toolVerdicts, cacheHit, privacyRedacted) are stored on the audit row but never participated in the original hash input.

Numeric/string drift on costUsd: sign-site hashes a JS number;

the DB stores it via .toFixed(6) as a numeric column; the verify-side read returns a string. Canonical JSON over 0.000123 ≠ canonical JSON over "0.000123". Even with field-set fixed, the values would not match.

buildAuditChainPayload() closes both: it picks exactly the 10 fields sign-time hashes AND coerces costUsd through .toFixed(6) → Number() so both inputs (sign-time number, verify-time string) produce identical canonical bytes.

How It Works

// src/security/audit-chain.ts
export function buildAuditChainPayload(input: {
  tenantId: string;
  requestId: string;
  principalId: string | null;
  model: string;
  isStreaming: boolean;
  outcome: string;
  costUsd: number | string | null | undefined;
  inputTokens: number | null | undefined;
  outputTokens: number | null | undefined;
  durationMs: number | null | undefined;
}): AuditChainPayload {
  // Coerce costUsd through .toFixed(6) → Number to match the DB-persisted form.
  // Without this normalization, sign-time `0.0000084` and DB-read "0.000008"
  // produce different canonical bytes.
  // …
}

Both call sites are now one-liners that hand the input to the helper:

// helpers.ts:268 (sign-site)
const chainLink = computeHash(
  params.tenantId,
  buildAuditChainPayload({ ...10-field input from completion params }),
);

// governance.ts:735 (verify-site)
const verifyEntries = chainedEntries.map((e) => {
  const payload = buildAuditChainPayload({ ...10-field input from DB row });
  return { ...payload, prevHash: e.prevHash!, eventHash: e.eventHash! };
});

The Numbers

Metric	Before	After
`/v1/governance/audit/chain/verify` against live API	`valid: false, brokenAt: 0`	`valid: true` for all entries signed after this PR
Field count hashed at sign-site	10	10 (unchanged — the bug was on verify side)
Field count reconstructed at verify-site	16	10 (matches sign-site exactly)
`costUsd` canonical form	sign-site `number`, verify-site `string`	both coerced to 6-decimal `number`
Integration tests for sign→verify roundtrip	0	8 (new file `audit-chain-roundtrip.test.ts`)
Total `audit-chain.ts` tests passing	19	19 (unchanged) + 8 new

What This DOESN'T Fix

Pre-existing entries are not retroactively verifiable. The 100 chain entries that exist in production were signed with the old (correct, 10-field) hash but lost precision when costUsd was rounded to 6 decimals on DB insert. The original full-precision JS-double values are not recoverable from the DB-stored rounded strings. Their eventHash values were computed from the unrounded input, so even with this fix they will still report valid: false when verified.

This was a deliberate scope choice. Re-signing existing entries in place would erase the tamper-evidence guarantee on those rows (you can't distinguish a re-signed legitimate entry from a forged one). The right follow-up is a chain checkpoint mechanism: write a special audit row that attests "entries 1..100 verified as legacy schema vN, treated as sealed; verification continues from entry 101 forward." Out of scope for this PR — flagged for a follow-up.

What this PR DOES fix: every entry signed AFTER deploy will verify correctly. The chain is repaired going forward.

Verification

npx vitest run src/security/audit-chain-roundtrip.test.ts — 8/8 passed
npx vitest run src/security/audit-chain.test.ts — 19/19 passed (existing tests unaffected)
pnpm tsgo — exit 0, 0 errors
oxfmt --check + oxlint --type-aware on edited files — clean
Live API verification deferred until merge + ECS deploy. Post-deploy:

curl -H "Authorization: Bearer " https://api.brainstormrouter.com/v1/governance/audit/chain/verify is expected to still return valid: false because the legacy 100 entries remain. New entries created after deploy should be verifiable in isolation (?since=).