2026-05-08-doc-state-drift-gate
2026-05-08 — Doc state-drift gate
Summary
R19 + R20 stochastic assessments both flagged "doc/state drift" with 4 of 10 agents independently citing the same three string mismatches. The risk has now been called out twice without being fixed. This PR fixes the three drift sites AND adds a CI gate so the class becomes impossible going forward.
What was actually drifting
| Surface | Said | Reality |
|---|---|---|
README.md line 6 | "31 models from 8 providers" | Catalog has 32 entries / 8 providers |
README.md line 14 | "— 31 curated models" | Same (32) |
README.md line 70 | "— 31 curated models" | Same |
README.md line 104 | "List available models (31 curated)" | GET /v1/models returns runtime-reachable subset (depends on provider keys) |
src/api/server.ts line 82 | "Intelligent routing across 247 model endpoints" | 27 currently reachable, 32 in catalog |
docs/reference/ops-runbooks.md L14 | "Upstash Free" | Production runs Upstash paid tier (per memory 2026-05-05) |
Production /v1/models returns 27 because the runtime filters out any provider with no API key — GROQ_API_KEY is unset, so Groq's 5 catalog entries are silently dropped. That distinction (catalog vs runtime-reachable) was not previously documented.
Changes
README.md— model count corrected to 32 catalog entries / 8 providers, with explicit note thatGET /v1/modelsreturns the runtime-reachable subset depending on configured provider keys.src/api/server.ts— embedded/llms.txtno longer hardcodes a stale "247 model endpoints" claim; refers callers toGET /v1/modelsfor the live count.docs/reference/ops-runbooks.md— Redis tier label corrected to "Upstash (paid tier, ...)".scripts/check-doc-state-drift.mjs— new gate script that:
- Counts catalog entries + provider sections in
src/router/provider-catalog-pricing.ts(single source of truth). - Asserts every
"andmodels from providers" "—phrase in README matches.curated models" - Forbids hardcoded
"claims inmodel endpoints" src/api/server.ts(the embedded/llms.txt). - Forbids "Upstash Free" in
docs/reference/ops-runbooks.md.
.github/workflows/ci.yml— wires the gate as acheckjob step that runs after format/types/lint; fails CI on any drift.
Verification
$ node scripts/check-doc-state-drift.mjs
Doc state-drift gate OK: catalog has 32 models / 8 providers; README + ops-runbooks + /llms.txt agree.
$ pnpm check
oxlint: Found 0 warnings and 0 errors.
oxfmt: All matched files use the correct format.
tsgo: 0 errors
$ # Regression test: introduce a stale claim, confirm gate fails:
$ sed -i '' 's/Upstash (paid tier/Upstash Free (paid tier/' docs/reference/ops-runbooks.md
$ node scripts/check-doc-state-drift.mjs
Doc state-drift gate FAILED:
docs/reference/ops-runbooks.md: mentions "Upstash Free" — production runs the paid tier ...
$ exit code: 1
Why this stops repeating
The previous response to the drift was: "flag in risk register → forget → next round flags it again." With the CI gate in place, any future PR that introduces a stale model-count claim or mis-tier Redis label fails before merge. The class is now closed at the build-system level rather than at the human-attention level.
Lockstep checklist
- [x] Source — README + ops-runbooks + server.ts updated to reality
- [x] Test/gate —
scripts/check-doc-state-drift.mjs(new) + verified to fail on regressions - [x] CI wiring —
.github/workflows/ci.ymlruns the gate in thecheckjob - [x] Ship log — this file
- [x] No SDK/MCP impact — pure docs/runtime-string fix; no API surface change