Forensics panel 500 — listForTenant unwrapped pg.QueryResult.rows

2026-05-06

apicafobservabilitydashboard

LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: ["GET /auth/mesh/forensics", "GET /v1/mesh/forensics"] sdk_methods_updated: ["none — internal store fix, route surface unchanged"] mcp_tools_updated: ["security.getForensicSnapshots (call path unchanged)"] ---

What We Built

Fixed ForensicSnapshotStore.listForTenant so it unwraps the pg.QueryResult shape returned by drizzle's tx.execute(). Every dashboard load of the EVIDENCE → Forensic Snapshots panel was 500'ing with TypeError: (intermediate value).map is not a function. The function was treating the QueryResult wrapper ({ rows, rowCount, command, oid, fields }) as if it were the rows array itself.

Why It Matters

This panel is the operator's "Proof of Death" view — the immutable record of what an agent was doing the moment the anomaly engine terminated it. With the panel returning 500, that audit surface was effectively offline: no security operator could see kill events through the dashboard at all.

How It Works

drizzle-orm/node-postgres tx.execute(sql) resolves to a raw pg.QueryResult, not the rows array. The codebase's other tx.execute callsites (src/db/stores/memory-store.ts, src/infra/jobs/content-cleanup.ts) all access .rows on the result. forensic-store.ts was the outlier — when the DISTINCT-ON refactor switched the query from tx.select() (which returns an array) to tx.execute() (which returns a wrapper), the existing cast rows as unknown as Array<...> slipped through type-checking but blew up at runtime.

// before
const rows = await withTenant(...);
const rawRows = rows as unknown as Array<Record<string, unknown>>;
rawRows.map(...) // TypeError

// after
const result = await withTenant(...);
const rawRows = ((result as { rows?: unknown }).rows ?? []) as Array<Record<string, unknown>>;
rawRows.map(...)

Added three regression tests with a stub db that mimics the real pg.QueryResult shape — these would have caught the bug before deploy.

The Numbers

100% of /auth/mesh/forensics calls were 500'ing pre-fix (verified via CW Logs filter pattern=forensics over a 30-min window)
Failure latency: 8ms (synchronous TypeError, not a hung query)
Tests added: 3 (rows-array unwrapping, empty result, post-DISTINCT-ON re-sort)

Lockstep Checklist

[x] API Routes: No route surface change — internal store fix.
[x] TS SDK: No SDK surface change.
[x] Python SDK: No SDK surface change.
[x] MCP Schemas: No tool surface change (security.getForensicSnapshots call path unchanged).
[x] Master Record: No capability change.