Cost Prediction — Estimate Request Cost Before Execution
2026-03-26
What We Built
POST /v1/estimate accepts a standard OpenAI-compatible chat completion payload and returns predicted cost per model before execution. No provider API calls are made — estimation uses the model registry pricing data and the same token heuristics as Guardian middleware.
If a model is specified, returns the estimate for that model plus the top 3 cheapest alternatives. If model is omitted or "auto", returns the top 5 cheapest models. Each estimate includes input/output token counts, cost in USD, context window, utilization percentage, and pricing breakdown.
Why It Matters
Before sending a 50K context request with 15 tools to an expensive model, operators and agents can now ask "what will this cost?" and compare alternatives. This enables budget-aware model selection at the application layer.
Lockstep Checklist
- [x] API Routes:
src/api/routes/estimate.ts— new POST endpoint - [x] TS SDK:
packages/sdk-ts/src/resources/estimate.ts—client.estimate.cost() - [x] Python SDK:
packages/sdk-py/.../estimate.py— sync + async - [x] MCP Schemas:
br_estimate_costtool in manifest, server, and agents.json - [ ] Master Record: Update when record exists