2026-05-08-e2e-suite-green

2026-05-08 — E2e suite green: hermetic test env (closes 23 e2e failures)

Summary

R20 stochastic assessment flagged "23 e2e tests failing across 16 files" as a 10/10 universal risk. R19 had the same flag with the same count. The next-action recommendation was unanimous across both rounds. This PR closes the class.

Before: 16 test files / 23 tests failed. pnpm test:e2e exited 1. After: 260 test files passed / 1843 tests passed / 0 failed. pnpm test:e2e exits 0.

Two distinct root causes (one shared fix)

Both came from the user shell leaking production-shaped env vars into the test runner.

1. SENTRY_DSN triggered an OTEL instrumentation crash on Node 22

R20 evidence's "Cannot read properties of undefined (reading 'close')" cascade across src/gateway/server.*.e2e.test.ts files (16+ failures) was a downstream effect, not the cause. The actual error was:

TypeError: The argument 'filename' must be a file URL object,
file URL string, or absolute path string. Received undefined
    at new Hook (require-in-the-middle/index.js:116:40)
    at RequireInTheMiddleSingleton._initialize
    at SentryHttpInstrumentation.<instance_members_initializer>
    at new InstrumentationBase

Importing @sentry/node constructs SentryHttpInstrumentation as a class field initializer. That initializer calls RequireInTheMiddleSingleton.getInstance() which calls new Hook(null, { internals: true }, ...). On Node 22 inside vitest's vmForks pool, this combination crashes with a cryptic "filename must be a URL" error from a Node internal.

Fix: lazy-load @sentry/node in src/infra/sentry.ts. The module's top-level import is now type-only (typeof import("@sentry/node")), and the actual runtime await import("@sentry/node") only fires inside initSentry() — which short-circuits if SENTRY_DSN is unset.

This makes src/gateway/server.impl.ts safe to import without dragging the OTEL chain. Since tests never need real Sentry, they never load it.

2. DATABASE_URL from user shell pointed at a stale production host

src/db/pgbouncer-compat.e2e.test.ts reads process.env.DATABASE_URL and runs against it (skipping only when unset). User shell exports DATABASE_URL=postgres://...db-postgresql-nyc3-brainstorm-do-user-18397709-0.l.db.ondigitalocean.com/... from a prior cluster that was deleted. So the e2e test runs and hits ENOTFOUND on the missing DO host.

Fix: test/setup.ts now strips DATABASE_URL (and SENTRY_DSN) from local test runs. Tests that need a real DB opt in via E2E_DATABASE_URL. CI runs (CI=true) skip the strip — the workflow's env: block already provides a scoped CI DB.

Changes

  • src/infra/sentry.ts — convert @sentry/node to lazy dynamic import; initSentry() is now async.
  • src/gateway/server.impl.tsawait initSentry() instead of initSentry().
  • test/setup.ts — hermetic test env: delete SENTRY_DSN; in non-CI runs, repoint or delete DATABASE_URL.

Verification

$ pnpm test:e2e
Test Files  260 passed | 38 skipped (298)
     Tests  1843 passed | 226 skipped (2069)
Duration  53.09s

$ pnpm test:fast    # unit suite still green
Test Files  819 passed | 8 skipped (827)
     Tests  7488 passed | 191 skipped (7679)

$ pnpm check         # types + lint + format clean

CI gate

pnpm test:e2e is already wired into the test-e2e job in .github/workflows/ci.yml (against an ephemeral postgres + redis). With the suite passing, the gate is now load-bearing. Future regressions will fail CI before merge.

Lockstep checklist

  • [x] Source — sentry.ts lazy-load, server.impl.ts await, test/setup.ts hermetic env
  • [x] Test/gatepnpm test:e2e green (260 files / 1843 tests / 0 failed)
  • [x] Ship log — this file
  • [x] R19 + R20 risk register — 10/10 universal flag retired
  • [x] No SDK/MCP impact — internal infrastructure only