Cache Hygiene

O(1) LRU eviction and bounded-Map invariants for in-memory caches that survive long-running ECS tasks.

Why cache hygiene matters

In-memory caches are the difference between a sub-millisecond hot path and a Redis round-trip. But every cache is a memory leak waiting to happen if it isn't bounded, and every eviction strategy is a latency spike waiting to happen if it isn't O(1).

BrainstormRouter's /quality-fleet R1 (2026-05-07) found three sites where the in-memory cache had unbounded growth or O(N) eviction:

  1. Guardian intelligence (src/api/middleware/guardian.ts): three module-level Maps keyed by tenantIdewmaRatios, tenantDetectors, tenantAlerts — with no eviction.
  2. Credential stuffing detector (src/security/credential-stuffing.ts): ipFailures, keyFailures, keyIps Maps where pruned timestamps left empty Map entries forever.
  3. Semantic cache (src/router/model-semantic-cache.ts): evictLru() iterated all entries to find the minimum lastAccessedAt — O(N) per eviction, O(N²) when the while loop evicted multiple.

PR #215 fixed all three sites with one shared utility.

The LruMap utility

// src/utils/lru-map.ts (excerpt)
export class LruMap<K, V> {
  private inner = new Map<K, V>();
  constructor(private max: number) {}

  get(key: K): V | undefined {
    const v = this.inner.get(key);
    if (v !== undefined) {
      // Touch: re-insert at tail
      this.inner.delete(key);
      this.inner.set(key, v);
    }
    return v;
  }

  set(key: K, value: V): void {
    if (this.inner.has(key)) this.inner.delete(key);
    this.inner.set(key, value);
    while (this.inner.size > this.max) {
      // Evict from head
      const oldest = this.inner.keys().next().value;
      if (oldest === undefined) break;
      this.inner.delete(oldest);
    }
  }

  delete(key: K): boolean {
    return this.inner.delete(key);
  }
  get size(): number {
    return this.inner.size;
  }
}

The trick: JavaScript's Map iteration order is _insertion order_. By calling delete and re-set on every access, we move recently-used entries to the tail. Eviction always takes from the head — keys().next().value is O(1).

This is strictly faster than:

  • A separate doubly-linked-list LRU (extra allocation + pointer maintenance)
  • A timestamp-sorted scan (O(N) per eviction)
  • A bare Map with periodic full eviction (latency spikes)

Three sites that adopted it

Guardian intelligence (per-tenant maps)

// src/api/middleware/guardian.ts (post-fix)
const ewmaRatios = new LruMap<string, number>(500);
const tenantDetectors = new LruMap<string, AnomalyDetector>(500);
const tenantAlerts = new LruMap<string, AlertState>(500);

Cap is 500 tenants. At one entry per active tenant, this comfortably holds the working set even for production deployments with thousands of tenants on the books — only _active_ tenants stay resident.

Credential stuffing (sliding-window counters)

The fix here had two parts:

  1. Cap the Map at 10,000 entries via LruMap.
  2. Delete on prune-to-zero: when WindowCounter.prune() removes the last timestamp from an entry, delete the Map entry too. Without this, every unique IP that hit the in-memory fallback path left a permanent empty entry.
// pseudo-code for the prune fix
function prune(counter: WindowCounter): void {
  counter.timestamps = counter.timestamps.filter((t) => t > cutoff);
  if (counter.timestamps.length === 0) {
    ipFailures.delete(counter.key); // <-- the new line
  }
}

Semantic cache (LRU + tenant partition)

The semantic cache adopted the same insertion-order pattern _within each tenant partition_. lookup() calls touchLru(entry), which deletes the entry from its partition Map and re-inserts it at the tail. evictLru() iterates partition heads (one per partition) to find the global oldest — O(num_partitions) instead of O(total_entries). When a partition empties, it's deleted from the outer Map.

// src/router/model-semantic-cache.ts (post-fix)
private evictLru(): void {
  let oldest: CacheEntry | null = null;
  let oldestBucket: Map<string, CacheEntry> | null = null;
  let oldestPartKey: string | null = null;

  for (const [partKey, bucket] of this.partitions) {
    const headKey = bucket.keys().next().value;
    if (headKey === undefined) continue;
    const entry = bucket.get(headKey);
    if (!entry) continue;
    if (!oldest || entry.lastAccessedAt < oldest.lastAccessedAt) {
      oldest = entry;
      oldestBucket = bucket;
      oldestPartKey = partKey;
    }
  }
  if (oldest && oldestBucket && oldestPartKey) {
    oldestBucket.delete(oldest.key);
    this.entryCount--;
    if (oldestBucket.size === 0) {
      this.partitions.delete(oldestPartKey);
    }
  }
}

Reviewer rules going forward

Any new in-memory Map keyed by tenant, user, or IP should:

  1. Use LruMap from src/utils/lru-map.ts (or a WeakMap if appropriate)
  2. Pick a cap that bounds memory at the working-set size, not the lifetime-unique-keys size
  3. If it's a sliding-window counter (timestamps inside the value), delete the Map entry when timestamps prune to zero
  4. Never iterate entries.values() to find the oldest — that's the O(N) eviction antipattern. Use insertion order.

Code review checklist for caches: the file src/utils/lru-map.test.ts tests the contract. Any new cache should reuse it.

What this prevents

  • Slow memory growth on long-running ECS tasks — a task running for weeks no longer accumulates entries for every tenant/IP/key it has ever seen
  • Latency spikes when caches hit capacity — eviction is O(1) per entry instead of O(N²) in the worst case
  • Hidden compounding cost — the semantic cache went from ~15M float multiplications per request (10K entries × 1536-dim cosine) to ~150K (typical tenant partition is 100 entries)

Lockstep references