Redis CAS: Cross-Process Audit Chain Atomicity

2026-03-17

audit-chainanomaly-enginemcp-governance

What We Built

Replaced per-process in-memory Maps with Redis-backed distributed state for three critical subsystems: audit chain heads, anomaly engine scoring caches, and MCP approval queues. The audit chain uses a Lua CAS (Compare-And-Set) script for atomic chain head advancement across ECS tasks. All subsystems fall back gracefully to in-memory Maps when Redis is unavailable.

Boot seeding now queries the database for the latest event_hash per tenant and pre-warms both Redis and in-memory chain heads, eliminating the chain-restart-from-genesis problem on container replacement.

Why It Matters

In a multi-task ECS deployment, each task maintained its own chain head, causing audit chain forks and making anomaly severity invisible across instances. This made the tamper-evident audit trail unreliable in production horizontal scaling scenarios.

How It Works

The audit chain head is stored in Redis at audit:chain:{tenantId}. On each audit entry, computeHash() reads the in-memory chain head (zero-latency), computes the hash synchronously, advances the in-memory head, then fire-and-forgets a Redis Lua CAS to advance the distributed head. The Lua script atomically reads, compares, and sets — no other Redis command can interleave.

Anomaly engine state (pendingSlmScores, slmRateCounts, lastSeverity) uses write-through: always updates in-memory first (hot path stays sync), then mirrors to Redis with appropriate TTLs for cross-instance visibility.

Lockstep Checklist

  • [x] API Routes: No changes (internal state only).
  • [x] TS SDK: No changes needed.
  • [x] Python SDK: No changes needed.
  • [x] MCP Schemas: No changes needed.
  • [x] Master Record: N/A — infrastructure change.