2026-05-08-doc-state-drift-gate

2026-05-08 — Doc state-drift gate

Summary

R19 + R20 stochastic assessments both flagged "doc/state drift" with 4 of 10 agents independently citing the same three string mismatches. The risk has now been called out twice without being fixed. This PR fixes the three drift sites AND adds a CI gate so the class becomes impossible going forward.

What was actually drifting

SurfaceSaidReality
README.md line 6"31 models from 8 providers"Catalog has 32 entries / 8 providers
README.md line 14"— 31 curated models"Same (32)
README.md line 70"— 31 curated models"Same
README.md line 104"List available models (31 curated)"GET /v1/models returns runtime-reachable subset (depends on provider keys)
src/api/server.ts line 82"Intelligent routing across 247 model endpoints"27 currently reachable, 32 in catalog
docs/reference/ops-runbooks.md L14"Upstash Free"Production runs Upstash paid tier (per memory 2026-05-05)

Production /v1/models returns 27 because the runtime filters out any provider with no API key — GROQ_API_KEY is unset, so Groq's 5 catalog entries are silently dropped. That distinction (catalog vs runtime-reachable) was not previously documented.

Changes

  • README.md — model count corrected to 32 catalog entries / 8 providers, with explicit note that GET /v1/models returns the runtime-reachable subset depending on configured provider keys.
  • src/api/server.ts — embedded /llms.txt no longer hardcodes a stale "247 model endpoints" claim; refers callers to GET /v1/models for the live count.
  • docs/reference/ops-runbooks.md — Redis tier label corrected to "Upstash (paid tier, ...)".
  • scripts/check-doc-state-drift.mjs — new gate script that:
  1. Counts catalog entries + provider sections in src/router/provider-catalog-pricing.ts (single source of truth).
  2. Asserts every " models from providers" and "— curated models" phrase in README matches.
  3. Forbids hardcoded " model endpoints" claims in src/api/server.ts (the embedded /llms.txt).
  4. Forbids "Upstash Free" in docs/reference/ops-runbooks.md.
  • .github/workflows/ci.yml — wires the gate as a check job step that runs after format/types/lint; fails CI on any drift.

Verification

$ node scripts/check-doc-state-drift.mjs
Doc state-drift gate OK: catalog has 32 models / 8 providers; README + ops-runbooks + /llms.txt agree.

$ pnpm check
oxlint: Found 0 warnings and 0 errors.
oxfmt: All matched files use the correct format.
tsgo: 0 errors

$ # Regression test: introduce a stale claim, confirm gate fails:
$ sed -i '' 's/Upstash (paid tier/Upstash Free (paid tier/' docs/reference/ops-runbooks.md
$ node scripts/check-doc-state-drift.mjs
Doc state-drift gate FAILED:
  docs/reference/ops-runbooks.md: mentions "Upstash Free" — production runs the paid tier ...
$ exit code: 1

Why this stops repeating

The previous response to the drift was: "flag in risk register → forget → next round flags it again." With the CI gate in place, any future PR that introduces a stale model-count claim or mis-tier Redis label fails before merge. The class is now closed at the build-system level rather than at the human-attention level.

Lockstep checklist

  • [x] Source — README + ops-runbooks + server.ts updated to reality
  • [x] Test/gatescripts/check-doc-state-drift.mjs (new) + verified to fail on regressions
  • [x] CI wiring.github/workflows/ci.yml runs the gate in the check job
  • [x] Ship log — this file
  • [x] No SDK/MCP impact — pure docs/runtime-string fix; no API surface change