BrainstormRouter vs OpenRouter

Commodity pipe vs learning system: price-weighted random routing vs Bayesian optimization.

What OpenRouter does well

OpenRouter is the largest AI model aggregator:

  • 400+ models from dozens of providers
  • Simple pricing: 5% markup on provider costs (or $0 for some models)
  • Unified API: one endpoint, any model
  • Large community: widely adopted, good documentation
  • Prompt caching: transparent caching for supported providers

OpenRouter's strength is accessibility. If you want one API key for every model, OpenRouter makes it trivial.

Where the architectures diverge

OpenRouter is a commodity pipe — it normalizes provider APIs and adds a markup. The routing is price-weighted: cheapest available endpoint wins. BrainstormRouter is a learning system — it uses the data flowing through it to optimize for quality, cost, and reliability simultaneously.

DimensionOpenRouterBrainstormRouter
Routing algorithmPrice-weighted random / manualThompson Sampling (Bayesian learning)
Quality signalNone (pass-through)Validity scoring + quality EWMA per model
Cost controlAfter-the-fact billingPre-request cost prediction + seatbelts
MemoryNonePersistent pgvector memory (tenant-wide)
AgentsNoneBuilt-in Pi Runner with tool use + streaming
GovernanceNoneGuardian middleware with privacy modes
Response metadataModel used, token countsCost, efficiency, cache hit, Guardian status
Circuit breakersProvider-level failoverPer-endpoint dual-trigger state machine
Multi-tenancyNone (single API key)Full RLS, AsyncLocalStorage, ltree hierarchy
Semantic cachePrompt caching (provider)Vector similarity cache (request-level)
Self-hostedNoOpen source

The fundamental gap: feedback loops

OpenRouter processes 10,000 requests the same way it processes the first. The 10,001st request has no advantage from the previous 10,000.

BrainstormRouter accumulates three kinds of intelligence from every request:

1. Routing intelligence (Thompson Sampling)

Every response updates the posterior distribution for the selected model. From src/router/model-bandit.ts:60-71:

const bandit = new ModelBandit({
  minValidity: 0.5, // Never route below 50% success rate
  minQuality: 0.3, // Never route below 30% quality
  initialC: 1.5, // Explore aggressively early
  minC: 0.5, // Mild exploration at steady state
  thompsonThreshold: 500, // Switch from UCB1 to Thompson at 500 samples
});

After 500 requests, the router knows which model performs best for your specific workload — not generically, not by price, but by measured reward.

2. Cost intelligence (Guardian EWMA)

Guardian tracks output/input token ratios per tenant. From src/api/middleware/guardian.ts:59-85:

const EWMA_ALPHA = 0.1;
existing.ratio = EWMA_ALPHA * observed + (1 - EWMA_ALPHA) * existing.ratio;

After ~20 requests, Guardian can predict your cost before the provider call. This powers the cost seatbelt — reject a $5 request before it happens, not after.

3. Knowledge intelligence (Memory extraction)

Useful context from completions is extracted asynchronously and stored in pgvector. From src/db/stores/memory-extraction-store.ts:69-84:

SELECT id FROM memory_extraction_queue
WHERE status = 'pending' AND next_run_at <= $1
ORDER BY next_run_at ASC, created_at ASC
LIMIT $2
FOR UPDATE SKIP LOCKED

OpenRouter doesn't know what you talked about yesterday. BrainstormRouter remembers.

Code proof: response headers

OpenRouter response

x-request-id: gen-abc123

BrainstormRouter response

X-BR-Guardian-Status: on
X-BR-Estimated-Cost: $0.02
X-BR-Actual-Cost: $0.018
X-BR-Efficiency: 0.91
X-BR-Guardian-Overhead-Ms: 0.8

Every response tells you what it cost, how efficient it was, and how much overhead the intelligence layer added. No separate dashboard or API call needed.

The governance gap

OpenRouter is a shared multi-tenant pipe. Every user's traffic flows through the same infrastructure with the same (lack of) security controls:

Security capabilityOpenRouterBrainstormRouter
PII scanningNoneBuilt-in regex (email, phone, SSN, CC, IP) + pluggable backends
Outbound stream scanningNoneToken-by-token StreamingGuardrailEvaluator
Stream severingNonetruncated = true kills stream before PII reaches client
Tenant isolationNonePostgreSQL RLS with SET LOCAL transaction scoping
Egress allowlistNonePer-service domain restrictions with wildcard support
Governance enforcementNoneDeterministic keyword matching on streaming chunks
SIEM exportNoneCEF + ECS JSON structured security events
PII air gapNonescrubAndTokenize() / rehydrate() with in-memory token map

OpenRouter's architecture makes these features structurally impossible to add without a redesign. It's a shared pipe — there's no tenant context to scope policies against, no streaming interception layer to attach guardrails to, and no structured event pipeline to export violations from.

For developer prototyping, this doesn't matter. For enterprise production workloads with compliance requirements, it's disqualifying.

When to choose OpenRouter

  • You need access to 400+ models from one endpoint
  • Your routing needs are simple (cheapest or specific model)
  • You don't need memory, governance, or quality scoring
  • You want the largest community and ecosystem
  • You don't need multi-tenancy

When to choose BrainstormRouter

  • You want routing that improves with usage
  • You need per-response cost and efficiency data
  • You need persistent memory across requests
  • You need multi-tenant isolation for your customers
  • You want pre-request cost prediction
  • You need circuit breakers with exponential backoff
  • You prefer open source with self-hosting