P1 #246 — forced tool_choice excludes reasoning models; reasoning-token cost safety net
2026-05-09
LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: ["none — routing behavior change, no new routes"] sdk_methods_updated: ["none"] mcp_tools_updated: ["none"] ---
What We Built
Two fixes for a Computer Protocol regression where forced tool-call requests (tool_choice="required" or a specific function) were routed to reasoning models that don't reliably produce tool_calls.
Fix 1 — reasoning model exclusion: Added forcedToolChoice flag to AutoSelectRequest. When forcedToolChoice=true AND the request has tools, ModelAutoSelector.select() excludes all endpoints with the "reasoning" capability from the candidate pool before scoring. The flag is set in model-router-select.ts when params.raw.tool_choice is anything other than undefined, null, "none", or "auto".
Fix 2 — reasoning-token cost safety net: computeCost() in model-executor.ts now falls back to billing usage.reasoning at the output token rate when usage.output is absent. This covers the edge case where a reasoning model burns its token budget on internal reasoning before producing visible output (finish_reason=length, completion_tokens reported as 0 or absent).
Why It Matters
Reasoning models (o1, o3-mini, Claude thinking-mode) don't reliably produce tool_calls when tool_choice is forced — they may return empty content with finish_reason=length after exhausting the completion token budget on internal reasoning. Before this PR, the auto-selector had no mechanism to exclude them for forced-tool requests. The cost side-effect: when a reasoning model returns empty content, completion_tokens can be 0 even though significant reasoning tokens were consumed, resulting in cost_usd=0 for non-zero actual usage.
How It Works
tool_choice="required" → auto-selector picks best model by quality/cost
→ selects o3-mini (high quality score, supports tools capability in catalog)
→ o3-mini exhausts reasoning budget, returns tool_calls=[], finish_reason=length
→ cost_usd=0
After:
tool_choice="required" → isForcedToolChoice(raw.tool_choice) = true
→ autoSelector.select({ ..., forcedToolChoice: true, requiredCapabilities: {"tools"} })
→ reasoning endpoints excluded from candidate pool before scoring
→ selects next-best non-reasoning tool-capable model (e.g., gpt-4o, claude-haiku)
→ model returns populated tool_calls
→ if reasoning tokens somehow reach computeCost with output=0, billed at output rate
Lockstep Checklist
- [x] TypeScript SDK: no change (routing behavior, no API surface change)
- [x] Python SDK: no change
- [x] MCP tools: no change
- [x] OpenAPI: no change
- [x] Ship log: this entry
Closes Release Gate
- Item 2 (forced tool-call requests route to a tool-capable model): closed. Non-reasoning tool-capable model is now selected for forced tool_choice requests.