Brain Intelligence Plugin — Live model intelligence injected into every API completion
2026-03-16
What We Built
Wired the plugin hook system into the HTTP completions flow. Previously, BrainstormRouter's 20 typed plugin hooks only fired inside the internal Pi agent runner — external API callers at POST /v1/chat/completions and POST /v1/messages got none of that extensibility. Now three hooks fire on every API completion: before_model_resolve, before_prompt_build, and llm_output.
Built the first plugin that uses this infrastructure: Brain Intelligence. It injects live model performance rankings (7-day Thompson sampling data), budget forecasts, and cost-quality Pareto frontier data into every API completion's system prompt. It also provides a model advisor that recommends the highest-reward model, and a learning extractor that monitors for empty completions.
Added an MCP-to-Messages bridge at POST /v1/bridge/messages that accepts Anthropic Messages API format, auto-discovers MCP tools for the tenant, injects them, and runs a tool-execution loop until the LLM produces a final response.
Why It Matters
API callers now get intelligence-augmented completions by default. Every request sees which models perform best, what the budget looks like, and where the cost-quality sweet spots are — all injected automatically by the plugin system. This is what separates BrainstormRouter from dumb proxies: the system gets smarter with every request.
The MCP bridge eliminates the need for callers to build their own tool-execution loops. Send one request, get a fully-resolved response with all MCP tools auto-executed — including governance, RBAC, and audit on every tool call.
How It Works
Hook wiring (Phase 0): src/api/hooks/completions-hooks.ts adapts agent-shaped hook events ({ prompt, messages }) for the API path. A synthetic PluginHookAgentContext with messageProvider: "api" lets plugins distinguish API vs internal agent invocations. Precedence: request-interceptor memory injection runs first (Hono middleware), plugin hooks run second (additive).
Intelligence accessor: src/router/intelligence-global.ts exposes a singleton query interface over the performance tracker, budget forecaster, cost frontier, and endpoint registry. Initialized during gateway startup alongside the model router.
Plugin: extensions/brain-intelligence/index.ts registers three hooks:
before_prompt_build(priority 50): Injects model rankings, budget, Pareto frontierbefore_model_resolve(priority 25): Suggests highest-reward model when data is sufficientllm_output(priority 10): Monitors for empty completions with token consumption
MCP Bridge: src/api/routes/messages-bridge.ts — proper Hono route (not plugin HTTP) so it inherits the full middleware chain. Delegates LLM calls directly to routeCompletion() (same process, no HTTP round-trip). MCP tools namespaced as serverId__toolName to prevent cross-server collisions. Max 10 tool rounds.
The Numbers
- 3 hook points wired into completions flow (was 0)
- 32 new tests across 4 test files, all passing
- 7 new files, 3 modified files
- Plugin injects ~200-400 tokens of intelligence context per request
- Bridge supports up to 10 tool execution rounds per request
Competitive Edge
No other AI gateway injects live performance intelligence into completions. Portkey and OpenRouter route traffic but don't tell the LLM which models are performing best. The MCP bridge with auto-execution is unique to BrainstormRouter — other gateways require callers to implement their own tool loops.
Lockstep Checklist
- [x] API Routes:
src/api/routes/messages-bridge.tsadded, mounted inserver.ts. - [x] TS SDK:
packages/sdk-ts/src/resources/messages.ts— addedbridge()method. - [x] Python SDK:
packages/sdk-py/src/brainstormrouter/resources/messages.py— addedbridge()sync + async. - [ ] MCP Schemas: No MCP tool schema changes needed (bridge uses existing MCP infrastructure).
- [ ] Master Record: TODO — update
docs/architecture/master-capability-record.md.