Routing

Thompson sampling, model variants, presets, and fallback chains.

How routing works

Every /v1/chat/completions request passes through BrainstormRouter's model router, which selects the best endpoint based on your routing strategy.

Request → Route Resolution → Strategy Selection → Endpoint → Provider
                ↓
        @preset/slug → Preset config
        provider/model → Direct routing
        provider/model:variant → Variant suffix

Strategies

Set a strategy per-request or save it in a preset:

Strategy	Optimizes for	When to use
`price`	Lowest cost	Batch jobs, embeddings
`latency`	Fastest response	Real-time chat, autocomplete
`throughput`	Highest tokens/sec	Long-form generation
`priority`	Preferred model first	Production with specific model preference
`quality`	Best reward score	High-stakes tasks
`cascade`	Try models in order	Budget-constrained with fallbacks

curl https://api.brainstormrouter.com/v1/chat/completions \
  -H "Authorization: Bearer br_live_..." \
  -H "X-Model-Router-Strategy: price" \
  -d '{"model": "anthropic/claude-sonnet-4", "messages": [...]}'

Thompson sampling

BrainstormRouter uses Thompson sampling (a multi-armed bandit algorithm) to learn which model+provider endpoints perform best (src/router/model-bandit.ts). The algorithm uses UCB1 for cold-start exploration (< 500 samples) and Gaussian Thompson Sampling for steady-state exploitation (≥ 500 samples). The reward signal is a composite of:

Validity — Did the response complete successfully? (hard floor: ≥ 0.5)
Quality — Tool-call accuracy (EWMA, hard floor: ≥ 0.3)
Cost — Relative to the cheapest option
Latency — Relative to the fastest option

Over time, the router automatically shifts traffic toward endpoints that deliver the best results for your workload. No manual weight configuration needed.

See Thompson Sampling deep dive for the full algorithm with code.

Model variants

Append a variant suffix to any model for quick strategy overrides:

Variant	Effect
`:floor`	Cheapest available endpoint for this model
`:fast`	Lowest latency endpoint
`:best`	Highest quality score endpoint
`:smart`	Thompson sampling auto-select

await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4:floor",
  messages: [...],
});

Presets

Save a full routing configuration as a reusable slug:

await client.presets.create({
  slug: "production",
  name: "Production Config",
  model: "anthropic/claude-sonnet-4",
  strategy: "quality",
  fallbacks: ["openai/gpt-4o", "google/gemini-2.0-flash"],
  max_cost_usd: 0.50,
});

// Reference in any completion
await client.chat.completions.create({
  model: "@preset/production",
  messages: [...],
});

See Presets API for full CRUD operations.

Fallback chains

When the primary model fails (rate limit, timeout, error), the router automatically tries fallback models in order:

{
  "model": "anthropic/claude-sonnet-4",
  "route": {
    "fallbacks": ["openai/gpt-4o", "google/gemini-2.0-flash"],
    "strategy": "priority"
  }
}

Circuit breakers (src/router/circuit-breaker.ts) prevent repeated calls to failing endpoints. When an endpoint trips its circuit breaker (5 consecutive failures or 50% error rate over 60s), it's removed from rotation until it recovers via half-open probing with exponential backoff.

See Circuit Breakers for the state machine.

Routing replays

Record routing decisions and simulate alternatives:

# What would this week have cost with round-robin vs Thompson sampling?
curl https://api.brainstormrouter.com/v1/replays/simulate \
  -H "Authorization: Bearer br_live_..." \
  -d '{"strategy": "price", "since": "2025-02-20T00:00:00Z"}'

See Replays API for more.