Auto Mode
How BrainstormRouter intelligently selects models for each request.
Overview
Set model: "auto" and BrainstormRouter automatically selects the best model for each request based on complexity, cost, and learned quality scores.
curl https://api.brainstormrouter.com/v1/chat/completions \
-H "Authorization: Bearer br_live_..." \
-H "Content-Type: application/json" \
-d '{"model": "auto", "messages": [{"role": "user", "content": "What is 2+2?"}]}'
How It Works
Auto mode combines several intelligence systems:
1. Complexity Assessment
The auto-selector analyzes each request to estimate complexity:
- Token count — longer prompts suggest more complex tasks
- Tool presence — requests with tools need models that support function calling
- System prompt complexity — detailed instructions suggest nuanced tasks
- Message history depth — multi-turn conversations benefit from stronger models
2. Thompson Sampling (Bandit)
BrainstormRouter learns from every request. The Thompson sampling bandit tracks per-model success rates and explores new models intelligently:
- Models that produce high-quality responses get selected more often
- Models that fail or produce low-quality outputs are deprioritized
- New models get exploration budget to gather initial data
3. Cost-Quality Frontier
The cost optimizer finds the Pareto-optimal tradeoff between price and quality. For simple prompts, it picks cheap models. For complex prompts, it picks capable ones.
4. Circuit Breaker Awareness
Models with open circuit breakers (recently failing) are excluded from auto selection. This prevents routing to unhealthy providers.
What Gets Selected
| Request Type | Typical Selection |
|---|---|
| Simple factual question | Fast, cheap model (Haiku, GPT-4o-mini, Flash) |
| Code generation with tools | Capable model (Sonnet, GPT-4o) |
| Complex reasoning | Top-tier model (Opus, o1) |
| Research with citations | Perplexity Sonar Pro |
Conversation Consistency
When using conversation_id, auto mode keeps the same model for the entire conversation. This prevents jarring style changes mid-conversation.
The model is locked on the first request in a conversation and reused for subsequent requests with the same conversation_id.
Tenant Aliases
You can define aliases to control what "auto" and other shorthand names resolve to for your tenant:
curl -X PUT https://api.brainstormrouter.com/v1/aliases \
-H "Authorization: Bearer br_live_..." \
-H "Content-Type: application/json" \
-d '{"fast": "anthropic/claude-haiku-4-5", "smart": "anthropic/claude-sonnet-4-5"}'
Then use model: "fast" or model: "smart" in requests.
Opting Out
To bypass auto mode, specify a full model ID:
{ "model": "anthropic/claude-sonnet-4-5-20250514" }
Or use X-BR-Skip-Memory: true to prevent auto mode from using memory context for model selection.
Response Headers
Auto mode adds a X-BR-Route-Reason header explaining the selection:
X-BR-Route-Reason: auto:complexity-low
X-BR-Model: claude-haiku-4-5-20251001
X-BR-Provider: anthropic