Predictive Routing Intelligence — why THIS model for THIS request
2026-03-28
LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: ["POST /v1/intelligence/advise"] sdk_methods_updated: ["client.intelligence.advise()", "client.intelligence.advise()"] mcp_tools_updated: ["br_routing_advise"] ---
What We Built
The Routing Advisor synthesizes ALL 7 intelligence signals — pattern fingerprints, Thompson sampling arms, cost-quality Pareto frontier, circuit breaker state, agent reputation, model rankings, and content classification — into a single recommendation with a human-readable reasoning chain. Agents call POST /v1/intelligence/advise before making a completion request to understand WHY a model is recommended, what evidence supports it, what alternatives exist, and what the system learned from similar past requests.
Every /v1/chat/completions response now includes an X-BR-Routing-Reasoning header with compact JSON summarizing the routing decision context.
Why It Matters
No other AI gateway explains its routing decisions. Portkey routes by rules, OpenRouter routes by price, but neither tells you "312 similar requests in the last 7 days, claude-sonnet-4 had 94% success rate on this pattern, Thompson sampling agrees with UCB1 score 0.87, and it's Pareto-optimal at $0.08/request." This turns BrainstormRouter from a black-box proxy into a transparent intelligence layer that agents can reason about and learn from.
How It Works
The advisor is read-only and deterministic (no LLM calls). It follows a 12-step pipeline:
- Fingerprints the request (tool count, message count, token estimate, task type, system prompt, images)
- Queries pattern store for similar past requests and per-model success rates
- Queries Thompson arms for per-model reward distributions
- Queries cost-quality frontier for Pareto-optimal tradeoffs
- Checks circuit breaker to filter unhealthy models
- Queries agent reputation to adjust service level
- Queries rankings for global model quality context
- Scores candidates with a weighted composite (pattern 40%, Thompson 30%, rankings 20%, frontier 10%)
- Builds reasoning chain in natural language
- Computes confidence via sigmoid from sample counts
- Identifies alternatives with cost/quality/latency tradeoffs
- Generates learning signal based on quality trends
const client = new BrainstormRouter({ apiKey: "br_live_..." });
const advice = await client.intelligence.advise({
messages: [{ role: "user", content: "Review this architecture..." }],
tools: [...],
task_type: "code",
});
// advice.reasoning: ["Request fingerprint: code task with 8 tools...", ...]
// advice.evidence: { pattern_matches: 312, success_rate: 0.94, ... }
// advice.alternatives: [{ model: "openai/gpt-4.1", tradeoff: "25% cheaper, 8% lower quality" }]
The Numbers
- 7 intelligence signals synthesized per request
- 12-step reasoning pipeline, all in-memory
- Target latency: <50ms (no external calls)
- 12 test cases covering rich data, cold start, circuit breaking, overrides, confidence calibration, trends, cost ranking, alternatives, empty registry, and auto-classification
Competitive Edge
This is the feature that makes BrainstormRouter fundamentally different from Portkey, OpenRouter, and every other AI gateway. Those services route by static rules or price tables. BrainstormRouter learns from every request, builds per-pattern per-model success profiles, and explains its reasoning in a chain that agents can inspect, challenge, and learn from. It's the difference between a load balancer and an intelligence engine.
Lockstep Checklist
> _You MUST check these boxes [x] and verify the corresponding files are updated BEFORE committing this log._
- [x] API Routes:
src/api/routes/intelligence-advise.ts— POST /v1/intelligence/advise - [x] TS SDK:
packages/sdk-ts/src/resources/intelligence.ts—advise()method + types - [x] Python SDK:
packages/sdk-py/src/brainstormrouter/resources/intelligence.py— sync + asyncadvise() - [x] MCP Schemas:
src/mcp/handlers/intelligence.ts—br_routing_advisetool - [x] Master Record: Updated via ship log entry