Cache hygiene: bound 3 unbounded caches + O(1) LRU eviction
2026-05-07
What We Built
Three sites with unbounded Map growth or O(N) eviction, fixed as a class-of-bugs PR with a new shared utility. (1) guardian.ts: three module-level Maps (ewmaRatios, tenantDetectors, tenantAlerts) grew indefinitely. Bounded with LRU cap 500. (2) credential-stuffing.ts: Maps grew unbounded; pruned timestamps but never deleted Map entries when timestamps reached zero. Added prune-to-zero deletion + 10K LRU cap. (3) model-semantic-cache.ts: evictLru() iterated all entries to find min lastAccessedAt. Replaced with O(1) eviction via Map insertion-order — delete + set on access moves entry to tail; eviction takes from head.
Why It Matters
The 3 unbounded sites would have caused memory leaks in long-running ECS tasks. The O(N²) eviction would have surfaced as latency spikes when semantic cache hit capacity. None had manifested in production yet — caught proactively by /quality-fleet scanner. New src/utils/lru-map.ts utility makes future cache work O(1) by default.
How It Works
5 new regression tests: guardian Maps remain ≤500 under 10K unique-tenant insertion, credential-stuffing entries deleted when window prunes to zero, semantic-cache eviction completes in roughly constant time at max capacity, Map.values() NOT called during evictLru, LRU O(1) utility unit-tested in isolation.
Lockstep Checklist
- [x] No API route changes (middleware/internal — lockstep N/A)
- [x] No SDK changes
- [x] No MCP tool changes
- [x] Regression test included (test-first invariant per /quality-fleet protocol)
- [x] Linked to /quality-fleet R1 dashboard at
.quality/dashboard.md
Provenance
Auto-found by /quality-fleet R1 (2026-05-07) scanner round, fixed in fix-agent batch under "go for all of it" autonomy grant. PR #215 merged to main as commit fa573ddf6. Finding(s) tracked at .quality/findings.jsonl (entries: "b9c4a7d12e63", "d4a8c3f7b291", "a6d3f8c29b41"). Production-deployed via ECS task-def revision 732 series.