2026-05-08-fix-job-queue-redis-duplicate-error-handlers
2026-05-08 — Fix: job-queue duplicate connections need 'error' listeners
Summary
Drill 02 (2026-05-08, 11:57:58Z) caught an unhandled promise rejection on a task booting with a bogus Redis URL:
[brainstormrouter] Unhandled promise rejection: ReplyError: WRONGPASS invalid username-password pair
at parseError (/app/node_modules/.pnpm/redis-parser@3.0.0/node_modules/redis-parser/lib/parser.js:179:12)
Root cause: src/infra/job-queue.ts calls connection.duplicate() twice (once for the Queue, once per Worker) to satisfy BullMQ's maxRetriesPerRequest: null requirement. Each duplicate is a fresh ioredis EventEmitter and inherits options but not event listeners. With no 'error' listener, ioredis surfaces a connection-level error (e.g. WRONGPASS) as an unhandled rejection. The queue.on('error') and worker.on('error') handlers below catch BullMQ-layer errors but don't catch the underlying connection-level errors before the parser surfaces them.
Fix: attach a console.warn-only 'error' listener to both duplicates immediately after creation. Queue/Worker error handlers continue to handle higher-level error semantics (debouncing, pause-on-quota); the new listeners exist solely to absorb connection events and prevent unhandled rejections.
Changes
src/infra/job-queue.ts— addedbullMqConnection.on('error', ...)(line ~67) andworkerConnection.on('error', ...)(line ~125). Both listeners log viaconsole.warnwith the queue name for diagnosability.src/infra/job-queue.test.ts— addeddescribe("createJobQueueManager — duplicate connection error handling (regression: drill 02)")block. Mocks bullmq's Queue/Worker and a fake Redis whoseduplicate()returns fresh EventEmitters. Asserts:
- Every duplicate has at least one
'error'listener after queue init +process() - Emitting an
'error'on each duplicate does not produce any'unhandledRejection' console.warnis called when the listener fires
Verification
Test-first discipline:
# Without fix (source reverted, test in place):
FAIL src/infra/job-queue.test.ts
AssertionError: duplicate connection must have an 'error' listener:
expected 0 to be greater than or equal to 1
Test Files 1 failed | 818 passed
Tests 1 failed | 7487 passed
# With fix:
Test Files 819 passed
Tests 7488 passed
pnpm check passes (format clean, types 0 errors, lint 0/0).
Lockstep checklist
- [x] Source —
src/infra/job-queue.ts(2 minimal additions, ~14 lines including comments) - [x] Test —
src/infra/job-queue.test.ts(1 newdescribeblock, paired regression test that fails without the fix) - [x] Ship log — this file
- [x] No SDK/MCP changes — internal infrastructure, no API surface change
- [x] Drill 02 retire — this PR closes the action item from drill 02 result
docs/chaos-drills/results/02-redis-pull-2026-05-08T11-48-21Z.md(item: "Fix the unhandled promise rejection")
Out of scope (deferred)
- Debouncing the
[job-queue] Queue errorspam during reconnect loops — separate concern, separate PR - Adjusting boot-time Redis fail-closed vs degraded-boot policy — design decision, separate discussion
- Drill 02 Variant B (Upstash console reset for live-connection-drop test) — genuinely blocked on Upstash console access