Audit chain canonicalization — single source of truth fixes verify-side drift (was broken since 0d46a1728)
2026-05-09
What We Built
/v1/governance/audit/chain/verify has returned valid: false, brokenAt: 0 for every chain since 0d46a1728 ("feat(security): add hash-chained audit trail") shipped on 2026-03-06. The bug was diagnosed via live probe of api.brainstormrouter.com and confirmed by reading the canonical-payload boundary between sign-time and verify-time.
This PR introduces a single source of truth for the audit chain canonical payload — buildAuditChainPayload() in src/security/audit-chain.ts — used by both:
- Sign-site (
src/api/routes/completions/helpers.ts:268) when computing
eventHash for a new completion audit entry
- Verify-site (
src/api/routes/governance.ts:735) when reconstructing the
hash input from a DB row to validate the chain
Same function, same field selection, same type normalization. Future drift becomes a build-time error (it's a typed function signature) instead of a silently-broken chain that nobody notices because nothing reads the verify endpoint.
Why It Matters
The "tamper-evident audit trail" was the primary forensic + compliance artifact for completion-level auditing. From 2026-03-06 through this ship, it claimed to be tamper-evident but was tamper-detectable only in unit tests (src/security/audit-chain.test.ts round-trips the same in-memory object — never crosses the DB boundary). Against the production DB, every chain-verify call returned valid: false regardless of whether the entries were tampered with or not. A real adversary modifying entries would have hidden in the existing valid: false noise.
Two distinct bugs combined:
- Field-set drift (16 vs 10): sign-site hashed 10 fields; verify-site
stripped only prevHash, eventHash, id, createdAt from the DB row, leaving 16 fields. The 6 extras (principalAuthMethod, principalSpiffeId, principalRoles, toolVerdicts, cacheHit, privacyRedacted) are stored on the audit row but never participated in the original hash input.
- Numeric/string drift on
costUsd: sign-site hashes a JSnumber;
the DB stores it via .toFixed(6) as a numeric column; the verify-side read returns a string. Canonical JSON over 0.000123 ≠ canonical JSON over "0.000123". Even with field-set fixed, the values would not match.
buildAuditChainPayload() closes both: it picks exactly the 10 fields sign-time hashes AND coerces costUsd through .toFixed(6) → Number() so both inputs (sign-time number, verify-time string) produce identical canonical bytes.
How It Works
// src/security/audit-chain.ts
export function buildAuditChainPayload(input: {
tenantId: string;
requestId: string;
principalId: string | null;
model: string;
isStreaming: boolean;
outcome: string;
costUsd: number | string | null | undefined;
inputTokens: number | null | undefined;
outputTokens: number | null | undefined;
durationMs: number | null | undefined;
}): AuditChainPayload {
// Coerce costUsd through .toFixed(6) → Number to match the DB-persisted form.
// Without this normalization, sign-time `0.0000084` and DB-read "0.000008"
// produce different canonical bytes.
// …
}
Both call sites are now one-liners that hand the input to the helper:
// helpers.ts:268 (sign-site)
const chainLink = computeHash(
params.tenantId,
buildAuditChainPayload({ ...10-field input from completion params }),
);
// governance.ts:735 (verify-site)
const verifyEntries = chainedEntries.map((e) => {
const payload = buildAuditChainPayload({ ...10-field input from DB row });
return { ...payload, prevHash: e.prevHash!, eventHash: e.eventHash! };
});
The Numbers
| Metric | Before | After |
|---|---|---|
/v1/governance/audit/chain/verify against live API | valid: false, brokenAt: 0 | valid: true for all entries signed after this PR |
| Field count hashed at sign-site | 10 | 10 (unchanged — the bug was on verify side) |
| Field count reconstructed at verify-site | 16 | 10 (matches sign-site exactly) |
costUsd canonical form | sign-site number, verify-site string | both coerced to 6-decimal number |
| Integration tests for sign→verify roundtrip | 0 | 8 (new file audit-chain-roundtrip.test.ts) |
Total audit-chain.ts tests passing | 19 | 19 (unchanged) + 8 new |
What This DOESN'T Fix
Pre-existing entries are not retroactively verifiable. The 100 chain entries that exist in production were signed with the old (correct, 10-field) hash but lost precision when costUsd was rounded to 6 decimals on DB insert. The original full-precision JS-double values are not recoverable from the DB-stored rounded strings. Their eventHash values were computed from the unrounded input, so even with this fix they will still report valid: false when verified.
This was a deliberate scope choice. Re-signing existing entries in place would erase the tamper-evidence guarantee on those rows (you can't distinguish a re-signed legitimate entry from a forged one). The right follow-up is a chain checkpoint mechanism: write a special audit row that attests "entries 1..100 verified as legacy schema vN, treated as sealed; verification continues from entry 101 forward." Out of scope for this PR — flagged for a follow-up.
What this PR DOES fix: every entry signed AFTER deploy will verify correctly. The chain is repaired going forward.
Verification
npx vitest run src/security/audit-chain-roundtrip.test.ts— 8/8 passednpx vitest run src/security/audit-chain.test.ts— 19/19 passed (existing tests unaffected)pnpm tsgo— exit 0, 0 errorsoxfmt --check+oxlint --type-awareon edited files — clean- Live API verification deferred until merge + ECS deploy. Post-deploy:
curl -H "Authorization: Bearer is expected to still return valid: false because the legacy 100 entries remain. New entries created after deploy should be verifiable in isolation (?since=).