Model Recommend: data-driven model selection advisory for agents

2026-03-25

model-banditcost-optimizerperformance-trackerauto-selector

What We Built

POST /v1/recommend — an advisory endpoint that returns ranked model recommendations based on real production data. Given task metadata (type, complexity, tools needed, context size, budget constraints), BrainstormRouter returns the top 5 models with scores, reasons, costs, and capabilities. The agent retains full control — this is advisory, not routing.

Recommendations are powered by 4 intelligence systems: Thompson sampling scores from the model bandit, per-model quality metrics from the performance tracker, cost-quality frontier analysis, and circuit breaker health data. Models with open circuit breakers are automatically excluded.

Why It Matters

Agents currently pick models by hardcoding preferences or guessing. BrainstormRouter has real production data on which models work best for which tasks — this endpoint exposes that intelligence as a simple API call. The agent asks "what should I use for code generation with 50K tokens?" and gets a data-driven answer.

How It Works

{
  "task_type": "code-generation",
  "tools_count": 15,
  "estimated_context_tokens": 50000,
  "capabilities_needed": ["tool-calling", "reasoning"],
  "budget_max_usd": 0.1
}

Response:
{
  "recommendations": [
    {
      "model": "anthropic/claude-sonnet-4-5",
      "score": 0.94,
      "reason": "92% success rate across 500 requests",
      "estimated_cost_per_1k": 3.0,
      "capabilities": ["tools", "streaming", "reasoning", "vision"],
      "avg_latency_ms": 2400
    }
  ],
  "data_points": 1100,
  "recommendation_basis": "production_metrics",
  "tenant_id": "..."
}

Scoring formula: 0.5 thompson_score + 0.3 quality_score + 0.2 * cost_efficiency. With prefer_speed: true, latency gets 40% weight. Falls back to static capability matrix when <100 data points.

The Numbers

4 intelligence systems consulted per request
3 scoring dimensions (reward, quality, cost efficiency)
Top 5 recommendations per query
Capability filtering: 10 capability types (tools, reasoning, vision, etc.)
Budget filtering by estimated cost
Circuit breaker integration: unhealthy models auto-excluded

Competitive Edge

No other AI gateway provides data-driven model recommendations from production telemetry. Portkey has model selection but no advisory endpoint. OpenRouter has rankings but no personalized per-tenant recommendations. BrainstormRouter uses each tenant's actual usage data to recommend the best model for their specific workload.

Lockstep Checklist

[x] API Routes: src/api/routes/recommend.ts — new POST /v1/recommend
[x] TS SDK: packages/sdk-ts/src/resources/recommend.ts — client.recommend.suggest()
[x] Python SDK: packages/sdk-py/src/brainstormrouter/resources/recommend.py — sync + async
[x] MCP Schemas: br_recommend_model tool registered in server.ts + manifest
[x] Master Record: Intelligence systems documented in ship log