Parallel model comparison: fan same prompt to 2-6 models, compare quality/cost/latency
2026-03-24
What We Built
POST /v1/compare — parallel model comparison endpoint. Sends the same prompt to 2-6 models via Promise.allSettled, returns per-model output, cost, latency, and quality signals from ValidityScorer. A comparison summary picks cheapest, fastest, highest quality, and best value (quality/cost ratio).
Powers brainstorm probe "task" --compare --models A,B,C,D — a single CLI command for real-time model evaluation with actual completions, not synthetic benchmarks.
Why It Matters
Model selection shouldn't be based on benchmarks alone. This endpoint lets users (and the Brainstorm CLI) run a real task against multiple models and see which one actually performs best for _their specific use case_. Per-model timeouts via Promise.race mean one slow model doesn't block the comparison. Circuit breakers and cost tracking apply automatically through routeCompletion.
Lockstep Checklist
- [x] API Routes:
src/api/routes/compare.ts— POST /v1/compare - [x] TS SDK:
packages/sdk-ts/src/resources/compare.ts—compare.run() - [x] Python SDK:
packages/sdk-py/src/brainstormrouter/resources/compare.py—compare.run() - [x] MCP Schemas:
br_compare_modelstool added to manifest and server, dep wired