DeepSeek V4 catalog: add v4-pro/v4-flash, fix V3-era pricing drift, schedule legacy alias deprecation
2026-05-09
What We Built
DeepSeek released V4 (preview) on April 24, 2026. V4 ships in two variants: v4-pro (1.6T params, 1M context) and v4-flash (284B params, 1M context). The pre-existing deepseek-chat and deepseek-reasoner IDs are now ALIASES for deepseek-v4-flash (non-thinking and thinking modes), and DeepSeek deprecates the alias names at 2026-07-24T15:59:00Z — 76 days from this ship.
This PR closes three problems at once:
- New routing targets:
deepseek-v4-proanddeepseek-v4-flashare now in
PROVIDER_PRICING so model: "auto" can pick them.
- Pricing-drift correction: BR was billing
deepseek-chatat V3 rates
($0.27 input / $1.1 output) while DeepSeek already charged the V4-flash actual ($0.14 / $0.28). Every deepseek call's cost_usd and savings ledger entry was off by 50–70% on input. Pricing on deepseek-chat and deepseek-reasoner is now V4-flash actuals.
- Scheduled-deprecation alerts: A new
scheduled-deprecations.tsmodule
seeds provider-announced sunsets into the existing DeprecationDetector at boot. The detector was previously REACTIVE (probe-miss-driven); it now also surfaces PROACTIVE alerts from announced dates. /v1/intelligence/deprecations and the Deprecation response headers automatically pick these up — no new endpoint needed.
Why It Matters
Without action, on July 24 every BR call routed to deepseek-chat or deepseek-reasoner would have started erroring (no grace period). BR had zero V4 entries, so there was no fallback target. This was a 76-day fuse on ~30% of DeepSeek traffic.
The pricing-drift fix is also material: customers reading cost_usd headers or hitting /v1/intelligence/savings were getting V3 estimates while their DeepSeek bill (when they BYOK) showed V4 actuals. The numbers in BR's savings ledger now match what the upstream provider actually bills.
Pattern lift-out: scheduled-deprecations.ts is general — any provider that announces a sunset (OpenAI deprecating gpt-3.5-turbo, Anthropic deprecating claude-2.x) can be added as a new entry. The detector machinery, response headers, and SDK methods all just work.
How It Works
PROVIDER_PRICING (src/router/provider-catalog-pricing.ts) gains two entries (deepseek-v4-pro, deepseek-v4-flash) and updates two existing entries (deepseek-chat, deepseek-reasoner) to V4-flash actuals.
deepseek: [
{ id: "deepseek-v4-pro", cost: { input: 0.435, output: 0.87, cacheRead: 0.003625, cacheWrite: 0 } },
{ id: "deepseek-v4-flash", cost: { input: 0.14, output: 0.28, cacheRead: 0.0028, cacheWrite: 0 } },
{ id: "deepseek-chat", cost: { input: 0.14, output: 0.28, cacheRead: 0.0028, cacheWrite: 0 } },
{ id: "deepseek-reasoner", cost: { input: 0.14, output: 0.28, cacheRead: 0.0028, cacheWrite: 0 } },
],
A new SCHEDULED_DEPRECATIONS array in src/router/scheduled-deprecations.ts declares which models have provider-announced sunsets:
{
provider: "deepseek",
modelId: "deepseek-chat",
sunsetDate: "2026-07-24T15:59:00Z",
recommendedSuccessor: "deepseek/deepseek-v4-flash",
reason: "DeepSeek V4 release: the deepseek-chat alias points to V4-flash …",
},
toAlerts(now) converts these to DeprecationAlert objects with alertLevel set by proximity to sunset (≤90 days = critical, ≤180 = warning, otherwise watch). DeprecationDetector.seedScheduledAlerts(scheduled) — the only new method on the detector — pre-populates the alert map at boot, called from model-router-init.ts:753 right before detector.start().
Because the detector already feeds /v1/intelligence/deprecations and the Deprecation response headers (see deprecation-headers.test.ts), the new alerts surface immediately:
- TS SDK:
await client.intelligence.deprecations()→ returns 2 entries on first call after deploy - Python SDK:
client.intelligence.deprecations()→ same - Response headers on completions calls routed to deepseek-chat/-reasoner:
Deprecation: deepseek/deepseek-chat sunset 2026-07-24T15:59:00Z, migrate to deepseek/deepseek-v4-flash
The Numbers
| Metric | Before | After |
|---|---|---|
deepseek-chat input price (per 1M tok) | $0.27 (V3) | $0.14 (V4-flash actual) |
deepseek-chat output price (per 1M tok) | $1.1 (V3) | $0.28 (V4-flash actual) |
Models in catalog under deepseek provider | 2 | 4 |
/v1/intelligence/deprecations length | 0 (no signals yet) | 2 (deepseek-chat, -reasoner) |
| Days until DeepSeek alias sunset (from ship) | 76 | n/a |
Lockstep Notes
API contract change: /v1/models returns 2 new entries, /v1/intelligence/deprecations returns 2 alerts at boot. No new endpoints, no shape changes — both SDKs already expose intelligence.deprecations() and will surface the new alerts on next call without any SDK update. site/public/routes.json is unchanged.
deepseek-ingestor.ts is unaffected — it still emits alive: true patches based on what DeepSeek's /v1/models returns. Pricing remains catalog-driven. The ingestor's existing test (deepseek-ingestor.test.ts) continues to pass because the ingestor doesn't read pricing.
What This Doesn't Fix
- Cost-quality frontier weighting: deprecated models are NOT yet
downweighted by the frontier. Existing pricing changes do create a soft effect (V4-flash actuals make deepseek-chat look as cheap as deepseek-v4-flash, so the frontier won't prefer one over the other purely on cost). Hard weighting is a follow-up.
- Auto-migration on sunset:
autoMigrateAtis left unset on the alerts.
When the user wants automatic redirection, set this field and add the redirect logic in routing (separate PR).
- DeepSeek API key sanity: this PR doesn't probe DeepSeek's actual
/v1/models endpoint to confirm v4-pro and v4-flash are visible there yet (they're in preview). The deepseek-ingestor will mark them alive: true once they appear in the API response.
Verification
npx vitest run src/router/scheduled-deprecations.test.ts src/router/intelligence/deprecation-detector.test.ts: 19/19 passed.pnpm tsgo: 0 errors.oxfmt --check+oxlint --type-awareon edited files: clean.- Live API verification deferred until merge + ECS deploy.