Multi-Model Consensus: agreement-scored answer selection across 2-5 models

2026-03-26

consensus-engineapi-routesmcp-toolssdk-tssdk-py

LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: ["POST /v1/consensus"] sdk_methods_updated: ["client.consensus.evaluate()"] mcp_tools_updated: ["br_consensus"] ---

What We Built

A multi-model consensus engine that evaluates agreement across 2-5 pre-collected model responses and selects the best answer using configurable strategies. Agents collect responses from different models (via compare, parallel calls, or any method), then submit them to /v1/consensus for intelligent evaluation.

Three selection strategies leverage BrainstormRouter's intelligence systems: majority picks the response most others agree with (Jaccard similarity on word sets), quality selects the highest quality_score from the validity scorer, and weighted combines Thompson sampling weights with quality scores and agreement bonuses.

Why It Matters

High-stakes AI decisions (medical summaries, legal analysis, financial recommendations) benefit from cross-model verification. Rather than trusting a single model, agents can now ask multiple models and let BrainstormRouter determine which answer has the strongest consensus or highest quality. This turns BrainstormRouter from a router into a judge.

How It Works

const result = await client.consensus.evaluate({
  messages: [{ role: "user", content: "What is the capital of France?" }],
  responses: [
    {
      model: "anthropic/claude-sonnet-4",
      content: "Paris is the capital of France.",
      quality_score: 0.95,
    },
    { model: "openai/gpt-4.1", content: "The capital of France is Paris.", quality_score: 0.9 },
    {
      model: "google/gemini-2.5-pro",
      content: "France's capital city is Paris.",
      quality_score: 0.88,
    },
  ],
  consensus_strategy: "weighted",
  min_agreement: 0.7,
});
// result.selected_model, result.agreement_score, result.selection_reason

Agreement detection uses Jaccard similarity on word sets (no embedding dependency). The min_agreement threshold controls when two responses are considered to "agree."

The Numbers

  • 3 strategies: majority, quality, weighted
  • 2-5 model responses per evaluation
  • Zero external dependencies (no embeddings needed)
  • Sub-millisecond evaluation time (pure CPU, no I/O)

Competitive Edge

No other AI gateway offers built-in consensus evaluation. Portkey and OpenRouter route to one model at a time. BrainstormRouter lets agents verify answers across models and pick the most reliable response using real production intelligence data (Thompson weights, quality scores) rather than naive majority voting.

Lockstep Checklist

> _You MUST check these boxes [x] and verify the corresponding files are updated BEFORE committing this log._

  • [x] API Routes: src/api/routes/consensus.ts created, mounted in register-routes.ts.
  • [x] TS SDK: packages/sdk-ts/src/resources/consensus.ts created, registered in routing domain.
  • [x] Python SDK: packages/sdk-py/src/brainstormrouter/resources/consensus.py created, registered.
  • [x] MCP Schemas: br_consensus tool added to manifest and routing handler.
  • [x] Master Record: Reflected via ship log.