Why BrainstormRouter Exists

The hard-fork origin story — from consumer messaging app to autonomous AI API engine.

We built another AI wrapper. Then we ripped it apart.

BrainstormRouter started as OpenClaw — a consumer-facing AI messaging app with Telegram bots, Slack integrations, a mobile app, and a local dashboard. It had 13 intelligence systems under the hood: Thompson sampling for model selection, a persistent memory layer inspired by Letta, circuit breakers, cost optimization, quality scoring.

Nobody used any of them.

Users sent messages through Telegram. The router picked the cheapest model. The memory system collected dust. The Thompson sampling weights never converged because traffic was too thin and too random.

The intelligence was real. The product surface was wrong.

What we removed

In February 2025, we hard-forked OpenClaw and deleted everything that wasn't infrastructure:

Removed	What it was	LOC
`ui/`	Local gateway dashboard (Lit web components)	~8,000
`apps/`	Mobile apps (iOS, Android, macOS)	~12,000
`src/channels/`	Unified messaging abstraction	~3,000
`src/slack/`, `src/web/`	Channel implementations	~2,000
`src/telegram`, `src/discord`, `src/signal`, `src/imessage`	Already removed	—

Total: roughly 25,000 lines of consumer-facing code, deleted permanently.

See The Hard Fork for the full technical details.

What we kept

Everything that makes an API gateway intelligent rather than passive:

The Soul (src/agents/pi-embedded-runner/) The central intelligence engine. Multi-turn agent execution with tool use, session management, streaming, and autonomous memory extraction. This is the "Pi Runner" — it doesn't just proxy requests, it understands them.

Thompson Sampling (src/router/model-bandit.ts) UCB1 for cold-start exploration, Gaussian Thompson for steady-state exploitation. The router learns which model performs best for your workload over a 7-day sliding window. Not rules-based. Not price-weighted random. Actual Bayesian optimization.

Persistent Memory (src/db/stores/memory-extraction-store.ts) Postgres-backed durable extraction queue with FOR UPDATE SKIP LOCKED for concurrent worker processing. Memories are extracted asynchronously from completions and stored in pgvector for semantic retrieval. Your API calls build institutional knowledge.

Guardian Intelligence (src/api/middleware/guardian.ts) Pre-request cost prediction, post-response efficiency scoring, EWMA output/input ratio tracking, velocity monitoring. Fail-open design with a <5ms p95 latency budget. Every response includes X-BR-* headers with cost, efficiency, and cache data.

Circuit Breakers (src/router/circuit-breaker.ts) Dual-trigger state machine: consecutive failures (5 default) OR sliding-window error rate (50% over 60s). Exponential backoff on failed probes. Automatic recovery.

The new problem: shadow AI

In 2024, the conversation was about cost and latency. In 2025, the conversation shifted.

Enterprise teams aren't exhausted by provider edge cases. They're exhausted by shadow AI — developers passing API keys through personal accounts, teams spinning up AI endpoints without security review, prompts containing customer data flowing through unmonitored pipes, and model outputs reaching production users without PII scanning.

Every AI gateway on the market gives you a proxy. Some add logging. Some add cost tracking. None of them sit on the hot path of the streaming response and enforce security policy token-by-token.

An AI gateway that logs a prompt after it's been delivered to the provider is not a security tool. It's a compliance liability with a dashboard.

Our thesis

An AI gateway is a security appliance, not an infrastructure tool.

Every request to an AI provider carries risk: PII in the prompt, secrets in the context, sensitive data in the model's response. The gateway is the only choke point where you can enforce policy on all three — before, during, and after the provider call.

BrainstormRouter treats the gateway as an identity-aware firewall:

Before the call: Guardian scans for PII, predicts cost, enforces budgets. Egress allowlists restrict which domains the request can reach.
During the stream: StreamingGuardrailEvaluator buffers tokens, runs governance rules and PII detection, and severs the stream if the model outputs sensitive data. (src/security/streaming-guardrails.ts)
After the response: Structured security events export to your SIEM in CEF or ECS JSON. Every verdict is auditable.

Observability without enforcement is a dashboard nobody reads. Intelligence without security is a liability. BrainstormRouter does both:

If model A starts failing, don't alert — reroute. (Circuit breakers + Thompson sampling.)
If a tenant is burning budget, don't report — throttle. (Guardian + cost seatbelts.)
If a conversation reveals useful context, don't log it — remember it. (Memory extraction queue.)
If a model outputs PII, don't log it — sever the stream. (Streaming guardrails.)
If a model violates tenant governance, don't flag it — replace the output. (Governance validator.)

The intelligence systems aren't features you enable. They're the default behavior of the gateway. The security systems aren't optional add-ons. They're the architecture.