DeepSeek V4 catalog: add v4-pro/v4-flash, fix V3-era pricing drift, schedule legacy alias deprecation

2026-05-09

routerintelligencecatalog

What We Built

DeepSeek released V4 (preview) on April 24, 2026. V4 ships in two variants: v4-pro (1.6T params, 1M context) and v4-flash (284B params, 1M context). The pre-existing deepseek-chat and deepseek-reasoner IDs are now ALIASES for deepseek-v4-flash (non-thinking and thinking modes), and DeepSeek deprecates the alias names at 2026-07-24T15:59:00Z — 76 days from this ship.

This PR closes three problems at once:

  1. New routing targets: deepseek-v4-pro and deepseek-v4-flash are now in

PROVIDER_PRICING so model: "auto" can pick them.

  1. Pricing-drift correction: BR was billing deepseek-chat at V3 rates

($0.27 input / $1.1 output) while DeepSeek already charged the V4-flash actual ($0.14 / $0.28). Every deepseek call's cost_usd and savings ledger entry was off by 50–70% on input. Pricing on deepseek-chat and deepseek-reasoner is now V4-flash actuals.

  1. Scheduled-deprecation alerts: A new scheduled-deprecations.ts module

seeds provider-announced sunsets into the existing DeprecationDetector at boot. The detector was previously REACTIVE (probe-miss-driven); it now also surfaces PROACTIVE alerts from announced dates. /v1/intelligence/deprecations and the Deprecation response headers automatically pick these up — no new endpoint needed.

Why It Matters

Without action, on July 24 every BR call routed to deepseek-chat or deepseek-reasoner would have started erroring (no grace period). BR had zero V4 entries, so there was no fallback target. This was a 76-day fuse on ~30% of DeepSeek traffic.

The pricing-drift fix is also material: customers reading cost_usd headers or hitting /v1/intelligence/savings were getting V3 estimates while their DeepSeek bill (when they BYOK) showed V4 actuals. The numbers in BR's savings ledger now match what the upstream provider actually bills.

Pattern lift-out: scheduled-deprecations.ts is general — any provider that announces a sunset (OpenAI deprecating gpt-3.5-turbo, Anthropic deprecating claude-2.x) can be added as a new entry. The detector machinery, response headers, and SDK methods all just work.

How It Works

PROVIDER_PRICING (src/router/provider-catalog-pricing.ts) gains two entries (deepseek-v4-pro, deepseek-v4-flash) and updates two existing entries (deepseek-chat, deepseek-reasoner) to V4-flash actuals.

deepseek: [
  { id: "deepseek-v4-pro",   cost: { input: 0.435, output: 0.87,  cacheRead: 0.003625, cacheWrite: 0 } },
  { id: "deepseek-v4-flash", cost: { input: 0.14,  output: 0.28,  cacheRead: 0.0028,   cacheWrite: 0 } },
  { id: "deepseek-chat",     cost: { input: 0.14,  output: 0.28,  cacheRead: 0.0028,   cacheWrite: 0 } },
  { id: "deepseek-reasoner", cost: { input: 0.14,  output: 0.28,  cacheRead: 0.0028,   cacheWrite: 0 } },
],

A new SCHEDULED_DEPRECATIONS array in src/router/scheduled-deprecations.ts declares which models have provider-announced sunsets:

{
  provider: "deepseek",
  modelId: "deepseek-chat",
  sunsetDate: "2026-07-24T15:59:00Z",
  recommendedSuccessor: "deepseek/deepseek-v4-flash",
  reason: "DeepSeek V4 release: the deepseek-chat alias points to V4-flash …",
},

toAlerts(now) converts these to DeprecationAlert objects with alertLevel set by proximity to sunset (≤90 days = critical, ≤180 = warning, otherwise watch). DeprecationDetector.seedScheduledAlerts(scheduled) — the only new method on the detector — pre-populates the alert map at boot, called from model-router-init.ts:753 right before detector.start().

Because the detector already feeds /v1/intelligence/deprecations and the Deprecation response headers (see deprecation-headers.test.ts), the new alerts surface immediately:

  • TS SDK: await client.intelligence.deprecations() → returns 2 entries on first call after deploy
  • Python SDK: client.intelligence.deprecations() → same
  • Response headers on completions calls routed to deepseek-chat/-reasoner: Deprecation: deepseek/deepseek-chat sunset 2026-07-24T15:59:00Z, migrate to deepseek/deepseek-v4-flash

The Numbers

MetricBeforeAfter
deepseek-chat input price (per 1M tok)$0.27 (V3)$0.14 (V4-flash actual)
deepseek-chat output price (per 1M tok)$1.1 (V3)$0.28 (V4-flash actual)
Models in catalog under deepseek provider24
/v1/intelligence/deprecations length0 (no signals yet)2 (deepseek-chat, -reasoner)
Days until DeepSeek alias sunset (from ship)76n/a

Lockstep Notes

API contract change: /v1/models returns 2 new entries, /v1/intelligence/deprecations returns 2 alerts at boot. No new endpoints, no shape changes — both SDKs already expose intelligence.deprecations() and will surface the new alerts on next call without any SDK update. site/public/routes.json is unchanged.

deepseek-ingestor.ts is unaffected — it still emits alive: true patches based on what DeepSeek's /v1/models returns. Pricing remains catalog-driven. The ingestor's existing test (deepseek-ingestor.test.ts) continues to pass because the ingestor doesn't read pricing.

What This Doesn't Fix

  • Cost-quality frontier weighting: deprecated models are NOT yet

downweighted by the frontier. Existing pricing changes do create a soft effect (V4-flash actuals make deepseek-chat look as cheap as deepseek-v4-flash, so the frontier won't prefer one over the other purely on cost). Hard weighting is a follow-up.

  • Auto-migration on sunset: autoMigrateAt is left unset on the alerts.

When the user wants automatic redirection, set this field and add the redirect logic in routing (separate PR).

  • DeepSeek API key sanity: this PR doesn't probe DeepSeek's actual

/v1/models endpoint to confirm v4-pro and v4-flash are visible there yet (they're in preview). The deepseek-ingestor will mark them alive: true once they appear in the API response.

Verification

  • npx vitest run src/router/scheduled-deprecations.test.ts src/router/intelligence/deprecation-detector.test.ts: 19/19 passed.
  • pnpm tsgo: 0 errors.
  • oxfmt --check + oxlint --type-aware on edited files: clean.
  • Live API verification deferred until merge + ECS deploy.