Multi-Model Consensus: agreement-scored answer selection across 2-5 models
2026-03-26
LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: ["POST /v1/consensus"] sdk_methods_updated: ["client.consensus.evaluate()"] mcp_tools_updated: ["br_consensus"] ---
What We Built
A multi-model consensus engine that evaluates agreement across 2-5 pre-collected model responses and selects the best answer using configurable strategies. Agents collect responses from different models (via compare, parallel calls, or any method), then submit them to /v1/consensus for intelligent evaluation.
Three selection strategies leverage BrainstormRouter's intelligence systems: majority picks the response most others agree with (Jaccard similarity on word sets), quality selects the highest quality_score from the validity scorer, and weighted combines Thompson sampling weights with quality scores and agreement bonuses.
Why It Matters
High-stakes AI decisions (medical summaries, legal analysis, financial recommendations) benefit from cross-model verification. Rather than trusting a single model, agents can now ask multiple models and let BrainstormRouter determine which answer has the strongest consensus or highest quality. This turns BrainstormRouter from a router into a judge.
How It Works
const result = await client.consensus.evaluate({
messages: [{ role: "user", content: "What is the capital of France?" }],
responses: [
{
model: "anthropic/claude-sonnet-4",
content: "Paris is the capital of France.",
quality_score: 0.95,
},
{ model: "openai/gpt-4.1", content: "The capital of France is Paris.", quality_score: 0.9 },
{
model: "google/gemini-2.5-pro",
content: "France's capital city is Paris.",
quality_score: 0.88,
},
],
consensus_strategy: "weighted",
min_agreement: 0.7,
});
// result.selected_model, result.agreement_score, result.selection_reason
Agreement detection uses Jaccard similarity on word sets (no embedding dependency). The min_agreement threshold controls when two responses are considered to "agree."
The Numbers
- 3 strategies: majority, quality, weighted
- 2-5 model responses per evaluation
- Zero external dependencies (no embeddings needed)
- Sub-millisecond evaluation time (pure CPU, no I/O)
Competitive Edge
No other AI gateway offers built-in consensus evaluation. Portkey and OpenRouter route to one model at a time. BrainstormRouter lets agents verify answers across models and pick the most reliable response using real production intelligence data (Thompson weights, quality scores) rather than naive majority voting.
Lockstep Checklist
> _You MUST check these boxes [x] and verify the corresponding files are updated BEFORE committing this log._
- [x] API Routes:
src/api/routes/consensus.tscreated, mounted inregister-routes.ts. - [x] TS SDK:
packages/sdk-ts/src/resources/consensus.tscreated, registered in routing domain. - [x] Python SDK:
packages/sdk-py/src/brainstormrouter/resources/consensus.pycreated, registered. - [x] MCP Schemas:
br_consensustool added to manifest and routing handler. - [x] Master Record: Reflected via ship log.