Notification cooldown — move from emit-time bus gate to per-subscriber decoration
2026-05-09
LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: ["none — internal infra refactor"] sdk_methods_updated: ["none — no public surface change"] mcp_tools_updated: ["none — no new tools"] ---
What We Built
PlatformEventBus.emit() previously gated every subscriber behind a 15-minute cooldown that lived as private state on the bus itself. The cooldown logic ran before fan-out, so a single suppression decision applied to all registered subscribers (Slack, Email, Webhook delivery) identically. This was correct for user-facing notification channels — Slack/Email don't want to be paged twice for the same flapping provider — but it bakes a delivery concern into the bus and blocks the XDR thread from registering security-event subscribers that need real-time fan-out without channel-level dedup.
This PR extracts the cooldown into a per-subscriber decorator (src/infra/notifications/cooldown-decorator.ts) and strips the bus back to a pure Promise.allSettled fan-out. The three existing subscribers — Slack, Email, Webhook — are wrapped at registration time in src/api/server.ts:initNotifications with withCooldown(...), preserving their existing dedup behavior verbatim. The default cooldown key extractor (defaultCooldownKey) replays the exact keying logic that lived on the bus: provider events keyed by providerId, intelligence events keyed by the narrowest subject identifier (tenantId:keyId, tenantId:agentId, serviceId), tenant.* and model.discovered return undefined (always fire).
Why It Matters
The bus-level cooldown was a delivery concern wearing infrastructure clothes. Three downstream consequences of moving it to the right layer:
- XDR thread is unblocked. When X-2 adds 32 new security event types
and X-5 registers SIEM/XDR adapters as subscribers, those adapters must NOT be cooldown-gated — security events need real-time delivery to be useful. Per-subscriber decoration lets each new subscriber choose its own dedup posture without forking the bus or special-casing event types in getCooldownKey().
- Subscriber isolation improves. The previous shared
Map meant a subscriber that took 30s to deliver to Slack would still mark the cooldown for Email — which had nothing to do with the Slack delivery. With per-subscriber maps, each channel maintains independent state. Slack failures don't suppress Email delivery for the same key.
- The bus is now a primitive. Sub-300 LOC, no private state beyond
subscribers: Map, no behavior beyond subscribe() and emit(). Future event-bus consumers (internal agents, audit chain) can rely on simple fan-out semantics without reasoning about hidden suppression.
How It Works
withCooldown(handler, opts) is a higher-order function. Each call returns a fresh subscriber with its own Map keyed by event-derived strings:
import { withCooldown } from "../infra/notifications/cooldown-decorator.js";
import { createSlackSubscriber } from "../infra/notifications/slack.js";
bus.subscribe(
"*",
withCooldown(createSlackSubscriber({ webhookUrl }), {
windowMs: 15 * 60 * 1000, // override; defaults to DEFAULT_COOLDOWN_MS
label: "slack",
}),
);
Each invocation:
- Computes the cooldown key via
getKey(event)(default:
defaultCooldownKey). undefined → no cooldown for this event.
- Looks up
lastSeen.get(key). If withinwindowMs, log + suppress. - Otherwise records
Date.now()and bounds the map atmaxKeys
(default 1000) by evicting the oldest insertion.
- Awaits the wrapped handler — errors propagate to the bus, where
Promise.allSettled already isolates them from sibling subscribers.
The bus itself no longer carries cooldown state, no longer takes a cooldownMs constructor argument, and no longer reads BRAINSTORMROUTER_NOTIFICATION_COOLDOWN_MS. That env var is now read once in initNotifications() and passed as windowMs to each withCooldown call — a single env-var read instead of N constructor reads.
What This Doesn't Fix
- Fairness across subscribers. If a slow subscriber holds up
Promise.allSettled, fast subscribers still wait. That's a bus-level concern (queue depth / per-subscriber concurrency budget) for a separate PR.
- Cross-subscriber dedup. Two distinct Slack workspace destinations
registered as separate subscribers will each maintain independent cooldowns. If you want a single cooldown across both, share state externally (getKey can read from a shared store).
Test Plan
- New:
src/infra/notifications/__tests__/cooldown-decorator.test.ts—
15 tests covering same-key suppression, different-key passthrough, window expiry (via Date.now() mock), getKey() === undefined bypass, windowMs ≤ 0 shortcut, custom getKey override, bounded map under uniqueness pressure (20 unique keys with maxKeys=5), per-subscriber state isolation, and error propagation to the caller.
- Updated:
src/infra/notifications/__tests__/intelligence-events.test.ts—
the existing bus-level cooldown test was rewritten as an integration test that wraps a subscriber with withCooldown (validates the refactor doesn't regress real-world dedup semantics).
- Full unit suite: 7594 passed / 0 failed (832 test files).
Lockstep
- TypeScript SDK: no change (internal infra)
- Python SDK: no change
- MCP tools: no change (no agent-facing surface)
- API surface: no change (no routes affected)
site/public/routes.json: unchanged- OpenAPI: unchanged
This is a pure infrastructure refactor — by design it doesn't move any public surface. The lockstep matrix is empty by intent, not by oversight.