Cache Hygiene
O(1) LRU eviction and bounded-Map invariants for in-memory caches that survive long-running ECS tasks.
Why cache hygiene matters
In-memory caches are the difference between a sub-millisecond hot path and a Redis round-trip. But every cache is a memory leak waiting to happen if it isn't bounded, and every eviction strategy is a latency spike waiting to happen if it isn't O(1).
BrainstormRouter's /quality-fleet R1 (2026-05-07) found three sites where the in-memory cache had unbounded growth or O(N) eviction:
- Guardian intelligence (
src/api/middleware/guardian.ts): three module-level Maps keyed bytenantId—ewmaRatios,tenantDetectors,tenantAlerts— with no eviction. - Credential stuffing detector (
src/security/credential-stuffing.ts):ipFailures,keyFailures,keyIpsMaps where pruned timestamps left empty Map entries forever. - Semantic cache (
src/router/model-semantic-cache.ts):evictLru()iterated all entries to find the minimumlastAccessedAt— O(N) per eviction, O(N²) when the while loop evicted multiple.
PR #215 fixed all three sites with one shared utility.
The LruMap utility
// src/utils/lru-map.ts (excerpt)
export class LruMap<K, V> {
private inner = new Map<K, V>();
constructor(private max: number) {}
get(key: K): V | undefined {
const v = this.inner.get(key);
if (v !== undefined) {
// Touch: re-insert at tail
this.inner.delete(key);
this.inner.set(key, v);
}
return v;
}
set(key: K, value: V): void {
if (this.inner.has(key)) this.inner.delete(key);
this.inner.set(key, value);
while (this.inner.size > this.max) {
// Evict from head
const oldest = this.inner.keys().next().value;
if (oldest === undefined) break;
this.inner.delete(oldest);
}
}
delete(key: K): boolean {
return this.inner.delete(key);
}
get size(): number {
return this.inner.size;
}
}
The trick: JavaScript's Map iteration order is _insertion order_. By calling delete and re-set on every access, we move recently-used entries to the tail. Eviction always takes from the head — keys().next().value is O(1).
This is strictly faster than:
- A separate doubly-linked-list LRU (extra allocation + pointer maintenance)
- A timestamp-sorted scan (O(N) per eviction)
- A bare
Mapwith periodic full eviction (latency spikes)
Three sites that adopted it
Guardian intelligence (per-tenant maps)
// src/api/middleware/guardian.ts (post-fix)
const ewmaRatios = new LruMap<string, number>(500);
const tenantDetectors = new LruMap<string, AnomalyDetector>(500);
const tenantAlerts = new LruMap<string, AlertState>(500);
Cap is 500 tenants. At one entry per active tenant, this comfortably holds the working set even for production deployments with thousands of tenants on the books — only _active_ tenants stay resident.
Credential stuffing (sliding-window counters)
The fix here had two parts:
- Cap the Map at 10,000 entries via
LruMap. - Delete on prune-to-zero: when
WindowCounter.prune()removes the last timestamp from an entry, delete the Map entry too. Without this, every unique IP that hit the in-memory fallback path left a permanent empty entry.
// pseudo-code for the prune fix
function prune(counter: WindowCounter): void {
counter.timestamps = counter.timestamps.filter((t) => t > cutoff);
if (counter.timestamps.length === 0) {
ipFailures.delete(counter.key); // <-- the new line
}
}
Semantic cache (LRU + tenant partition)
The semantic cache adopted the same insertion-order pattern _within each tenant partition_. lookup() calls touchLru(entry), which deletes the entry from its partition Map and re-inserts it at the tail. evictLru() iterates partition heads (one per partition) to find the global oldest — O(num_partitions) instead of O(total_entries). When a partition empties, it's deleted from the outer Map.
// src/router/model-semantic-cache.ts (post-fix)
private evictLru(): void {
let oldest: CacheEntry | null = null;
let oldestBucket: Map<string, CacheEntry> | null = null;
let oldestPartKey: string | null = null;
for (const [partKey, bucket] of this.partitions) {
const headKey = bucket.keys().next().value;
if (headKey === undefined) continue;
const entry = bucket.get(headKey);
if (!entry) continue;
if (!oldest || entry.lastAccessedAt < oldest.lastAccessedAt) {
oldest = entry;
oldestBucket = bucket;
oldestPartKey = partKey;
}
}
if (oldest && oldestBucket && oldestPartKey) {
oldestBucket.delete(oldest.key);
this.entryCount--;
if (oldestBucket.size === 0) {
this.partitions.delete(oldestPartKey);
}
}
}
Reviewer rules going forward
Any new in-memory Map keyed by tenant, user, or IP should:
- Use
LruMapfromsrc/utils/lru-map.ts(or aWeakMapif appropriate) - Pick a cap that bounds memory at the working-set size, not the lifetime-unique-keys size
- If it's a sliding-window counter (timestamps inside the value),
deletethe Map entry when timestamps prune to zero - Never iterate
entries.values()to find the oldest — that's the O(N) eviction antipattern. Use insertion order.
Code review checklist for caches: the file src/utils/lru-map.test.ts tests the contract. Any new cache should reuse it.
What this prevents
- Slow memory growth on long-running ECS tasks — a task running for weeks no longer accumulates entries for every tenant/IP/key it has ever seen
- Latency spikes when caches hit capacity — eviction is O(1) per entry instead of O(N²) in the worst case
- Hidden compounding cost — the semantic cache went from ~15M float multiplications per request (10K entries × 1536-dim cosine) to ~150K (typical tenant partition is 100 entries)
Lockstep references
- PR #215 — implementation
.quality/findings.jsonlIDsb9c4a7d12e63,d4a8c3f7b291,a6d3f8c29b41- Ship log:
docs/ship-log/2026-05-07-cache-hygiene-class.md