2026-05-08-fix-job-queue-redis-duplicate-error-handlers

2026-05-08 — Fix: job-queue duplicate connections need 'error' listeners

Summary

Drill 02 (2026-05-08, 11:57:58Z) caught an unhandled promise rejection on a task booting with a bogus Redis URL:

[brainstormrouter] Unhandled promise rejection: ReplyError: WRONGPASS invalid username-password pair
    at parseError (/app/node_modules/.pnpm/redis-parser@3.0.0/node_modules/redis-parser/lib/parser.js:179:12)

Root cause: src/infra/job-queue.ts calls connection.duplicate() twice (once for the Queue, once per Worker) to satisfy BullMQ's maxRetriesPerRequest: null requirement. Each duplicate is a fresh ioredis EventEmitter and inherits options but not event listeners. With no 'error' listener, ioredis surfaces a connection-level error (e.g. WRONGPASS) as an unhandled rejection. The queue.on('error') and worker.on('error') handlers below catch BullMQ-layer errors but don't catch the underlying connection-level errors before the parser surfaces them.

Fix: attach a console.warn-only 'error' listener to both duplicates immediately after creation. Queue/Worker error handlers continue to handle higher-level error semantics (debouncing, pause-on-quota); the new listeners exist solely to absorb connection events and prevent unhandled rejections.

Changes

  • src/infra/job-queue.ts — added bullMqConnection.on('error', ...) (line ~67) and workerConnection.on('error', ...) (line ~125). Both listeners log via console.warn with the queue name for diagnosability.
  • src/infra/job-queue.test.ts — added describe("createJobQueueManager — duplicate connection error handling (regression: drill 02)") block. Mocks bullmq's Queue/Worker and a fake Redis whose duplicate() returns fresh EventEmitters. Asserts:
  1. Every duplicate has at least one 'error' listener after queue init + process()
  2. Emitting an 'error' on each duplicate does not produce any 'unhandledRejection'
  3. console.warn is called when the listener fires

Verification

Test-first discipline:

# Without fix (source reverted, test in place):
FAIL src/infra/job-queue.test.ts
  AssertionError: duplicate connection must have an 'error' listener:
  expected 0 to be greater than or equal to 1
Test Files  1 failed | 818 passed
     Tests  1 failed | 7487 passed

# With fix:
Test Files  819 passed
     Tests  7488 passed

pnpm check passes (format clean, types 0 errors, lint 0/0).

Lockstep checklist

  • [x] Sourcesrc/infra/job-queue.ts (2 minimal additions, ~14 lines including comments)
  • [x] Testsrc/infra/job-queue.test.ts (1 new describe block, paired regression test that fails without the fix)
  • [x] Ship log — this file
  • [x] No SDK/MCP changes — internal infrastructure, no API surface change
  • [x] Drill 02 retire — this PR closes the action item from drill 02 result docs/chaos-drills/results/02-redis-pull-2026-05-08T11-48-21Z.md (item: "Fix the unhandled promise rejection")

Out of scope (deferred)

  • Debouncing the [job-queue] Queue error spam during reconnect loops — separate concern, separate PR
  • Adjusting boot-time Redis fail-closed vs degraded-boot policy — design decision, separate discussion
  • Drill 02 Variant B (Upstash console reset for live-connection-drop test) — genuinely blocked on Upstash console access