Predictive Routing Intelligence — why THIS model for THIS request

2026-03-28

LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: ["POST /v1/intelligence/advise"] sdk_methods_updated: ["client.intelligence.advise()", "client.intelligence.advise()"] mcp_tools_updated: ["br_routing_advise"] ---

What We Built

The Routing Advisor synthesizes ALL 7 intelligence signals — pattern fingerprints, Thompson sampling arms, cost-quality Pareto frontier, circuit breaker state, agent reputation, model rankings, and content classification — into a single recommendation with a human-readable reasoning chain. Agents call POST /v1/intelligence/advise before making a completion request to understand WHY a model is recommended, what evidence supports it, what alternatives exist, and what the system learned from similar past requests.

Every /v1/chat/completions response now includes an X-BR-Routing-Reasoning header with compact JSON summarizing the routing decision context.

Why It Matters

No other AI gateway explains its routing decisions. Portkey routes by rules, OpenRouter routes by price, but neither tells you "312 similar requests in the last 7 days, claude-sonnet-4 had 94% success rate on this pattern, Thompson sampling agrees with UCB1 score 0.87, and it's Pareto-optimal at $0.08/request." This turns BrainstormRouter from a black-box proxy into a transparent intelligence layer that agents can reason about and learn from.

How It Works

The advisor is read-only and deterministic (no LLM calls). It follows a 12-step pipeline:

Fingerprints the request (tool count, message count, token estimate, task type, system prompt, images)
Queries pattern store for similar past requests and per-model success rates
Queries Thompson arms for per-model reward distributions
Queries cost-quality frontier for Pareto-optimal tradeoffs
Checks circuit breaker to filter unhealthy models
Queries agent reputation to adjust service level
Queries rankings for global model quality context
Scores candidates with a weighted composite (pattern 40%, Thompson 30%, rankings 20%, frontier 10%)
Builds reasoning chain in natural language
Computes confidence via sigmoid from sample counts
Identifies alternatives with cost/quality/latency tradeoffs
Generates learning signal based on quality trends

const client = new BrainstormRouter({ apiKey: "br_live_..." });

const advice = await client.intelligence.advise({
  messages: [{ role: "user", content: "Review this architecture..." }],
  tools: [...],
  task_type: "code",
});
// advice.reasoning: ["Request fingerprint: code task with 8 tools...", ...]
// advice.evidence: { pattern_matches: 312, success_rate: 0.94, ... }
// advice.alternatives: [{ model: "openai/gpt-4.1", tradeoff: "25% cheaper, 8% lower quality" }]

The Numbers

7 intelligence signals synthesized per request
12-step reasoning pipeline, all in-memory
Target latency: <50ms (no external calls)
12 test cases covering rich data, cold start, circuit breaking, overrides, confidence calibration, trends, cost ranking, alternatives, empty registry, and auto-classification

Competitive Edge

This is the feature that makes BrainstormRouter fundamentally different from Portkey, OpenRouter, and every other AI gateway. Those services route by static rules or price tables. BrainstormRouter learns from every request, builds per-pattern per-model success profiles, and explains its reasoning in a chain that agents can inspect, challenge, and learn from. It's the difference between a load balancer and an intelligence engine.

Lockstep Checklist

> _You MUST check these boxes [x] and verify the corresponding files are updated BEFORE committing this log._

[x] API Routes: src/api/routes/intelligence-advise.ts — POST /v1/intelligence/advise
[x] TS SDK: packages/sdk-ts/src/resources/intelligence.ts — advise() method + types
[x] Python SDK: packages/sdk-py/src/brainstormrouter/resources/intelligence.py — sync + async advise()
[x] MCP Schemas: src/mcp/handlers/intelligence.ts — br_routing_advise tool
[x] Master Record: Updated via ship log entry