Routing

Thompson sampling, model variants, presets, and fallback chains.

How routing works

Every /v1/chat/completions request passes through BrainstormRouter's model router, which selects the best endpoint based on your routing strategy.

Request → Route Resolution → Strategy Selection → Endpoint → Provider
                ↓
        @preset/slug → Preset config
        provider/model → Direct routing
        provider/model:variant → Variant suffix

Strategies

Set a strategy per-request or save it in a preset:

StrategyOptimizes forWhen to use
priceLowest costBatch jobs, embeddings
latencyFastest responseReal-time chat, autocomplete
throughputHighest tokens/secLong-form generation
priorityPreferred model firstProduction with specific model preference
qualityBest reward scoreHigh-stakes tasks
cascadeTry models in orderBudget-constrained with fallbacks
curl https://api.brainstormrouter.com/v1/chat/completions \
  -H "Authorization: Bearer br_live_..." \
  -H "X-Model-Router-Strategy: price" \
  -d '{"model": "anthropic/claude-sonnet-4", "messages": [...]}'

Thompson sampling

BrainstormRouter uses Thompson sampling (a multi-armed bandit algorithm) to learn which model+provider endpoints perform best (src/router/model-bandit.ts). The algorithm uses UCB1 for cold-start exploration (< 500 samples) and Gaussian Thompson Sampling for steady-state exploitation (≥ 500 samples). The reward signal is a composite of:

  • Validity — Did the response complete successfully? (hard floor: ≥ 0.5)
  • Quality — Tool-call accuracy (EWMA, hard floor: ≥ 0.3)
  • Cost — Relative to the cheapest option
  • Latency — Relative to the fastest option

Over time, the router automatically shifts traffic toward endpoints that deliver the best results for your workload. No manual weight configuration needed.

See Thompson Sampling deep dive for the full algorithm with code.

Model variants

Append a variant suffix to any model for quick strategy overrides:

VariantEffect
:floorCheapest available endpoint for this model
:fastLowest latency endpoint
:bestHighest quality score endpoint
:smartThompson sampling auto-select
await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4:floor",
  messages: [...],
});

Presets

Save a full routing configuration as a reusable slug:

await client.presets.create({
  slug: "production",
  name: "Production Config",
  model: "anthropic/claude-sonnet-4",
  strategy: "quality",
  fallbacks: ["openai/gpt-4o", "google/gemini-2.0-flash"],
  max_cost_usd: 0.50,
});

// Reference in any completion
await client.chat.completions.create({
  model: "@preset/production",
  messages: [...],
});

See Presets API for full CRUD operations.

Fallback chains

When the primary model fails (rate limit, timeout, error), the router automatically tries fallback models in order:

{
  "model": "anthropic/claude-sonnet-4",
  "route": {
    "fallbacks": ["openai/gpt-4o", "google/gemini-2.0-flash"],
    "strategy": "priority"
  }
}

Circuit breakers (src/router/circuit-breaker.ts) prevent repeated calls to failing endpoints. When an endpoint trips its circuit breaker (5 consecutive failures or 50% error rate over 60s), it's removed from rotation until it recovers via half-open probing with exponential backoff.

See Circuit Breakers for the state machine.

Routing replays

Record routing decisions and simulate alternatives:

# What would this week have cost with round-robin vs Thompson sampling?
curl https://api.brainstormrouter.com/v1/replays/simulate \
  -H "Authorization: Bearer br_live_..." \
  -d '{"strategy": "price", "since": "2025-02-20T00:00:00Z"}'

See Replays API for more.