P1 #246 — forced tool_choice excludes reasoning models; reasoning-token cost safety net

2026-05-09

routerauto-selector

LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: ["none — routing behavior change, no new routes"] sdk_methods_updated: ["none"] mcp_tools_updated: ["none"] ---

What We Built

Two fixes for a Computer Protocol regression where forced tool-call requests (tool_choice="required" or a specific function) were routed to reasoning models that don't reliably produce tool_calls.

Fix 1 — reasoning model exclusion: Added forcedToolChoice flag to AutoSelectRequest. When forcedToolChoice=true AND the request has tools, ModelAutoSelector.select() excludes all endpoints with the "reasoning" capability from the candidate pool before scoring. The flag is set in model-router-select.ts when params.raw.tool_choice is anything other than undefined, null, "none", or "auto".

Fix 2 — reasoning-token cost safety net: computeCost() in model-executor.ts now falls back to billing usage.reasoning at the output token rate when usage.output is absent. This covers the edge case where a reasoning model burns its token budget on internal reasoning before producing visible output (finish_reason=length, completion_tokens reported as 0 or absent).

Why It Matters

Reasoning models (o1, o3-mini, Claude thinking-mode) don't reliably produce tool_calls when tool_choice is forced — they may return empty content with finish_reason=length after exhausting the completion token budget on internal reasoning. Before this PR, the auto-selector had no mechanism to exclude them for forced-tool requests. The cost side-effect: when a reasoning model returns empty content, completion_tokens can be 0 even though significant reasoning tokens were consumed, resulting in cost_usd=0 for non-zero actual usage.

How It Works

tool_choice="required" → auto-selector picks best model by quality/cost
  → selects o3-mini (high quality score, supports tools capability in catalog)
  → o3-mini exhausts reasoning budget, returns tool_calls=[], finish_reason=length
  → cost_usd=0

After:
  tool_choice="required" → isForcedToolChoice(raw.tool_choice) = true
  → autoSelector.select({ ..., forcedToolChoice: true, requiredCapabilities: {"tools"} })
  → reasoning endpoints excluded from candidate pool before scoring
  → selects next-best non-reasoning tool-capable model (e.g., gpt-4o, claude-haiku)
  → model returns populated tool_calls
  → if reasoning tokens somehow reach computeCost with output=0, billed at output rate

Lockstep Checklist

[x] TypeScript SDK: no change (routing behavior, no API surface change)
[x] Python SDK: no change
[x] MCP tools: no change
[x] OpenAPI: no change
[x] Ship log: this entry

Closes Release Gate

Item 2 (forced tool-call requests route to a tool-capable model): closed. Non-reasoning tool-capable model is now selected for forced tool_choice requests.