Intelligence endpoints expose top-level .summary envelope (R25 probe-null fix)
2026-05-14
What We Built
GET /v1/intelligence/savings and GET /v1/intelligence/benchmark now return a top-level summary object that captures the headline metrics (totalSavingsUsd, trackedRequests, pathCount, source, verdict, p-value, win-rates, etc.) without forcing a caller to walk nested structures.
The existing field layout is preserved — all totalSavingsUsd, bySource, byPath, period, strategies_compared, thompson_advantage keys remain at the same paths. The summary field is purely additive.
Why It Matters
The R25 stochastic assessment recorded 8/10 agents flagging "null to jq '.summary' probe" against both endpoints. The endpoints had rich data; they just didn't expose a stable headline-metrics envelope under a predictable key. Agents probing endpoints with a defaulted .summary selector saw null and counted it as a missing capability.
Adding .summary (a) makes the API more agent-friendly — a single key returns the headline KPIs, (b) gives dashboards a stable shape to render without parsing edge cases (e.g. verdict: "inconclusive" when sample size is too small for stat-significance), and (c) closes a recurring assessment finding without changing the underlying data.
How It Works
/v1/intelligence/savings summary:
{
"summary": {
"totalSavingsUsd": 36.77,
"trackedRequests": 5277,
"avgSavingsPerRequest": 0.00697,
"topSource": { "id": "cost_optimizer", "savingsUsd": 36.12, "requests": 5226 },
"sources": [
/* sorted desc by savings */
],
"pathCount": 53,
"periodStartIso": "2026-04-14T16:13:12.665Z",
"periodEndIso": "2026-05-14T16:13:12.665Z",
"source": "postgres"
},
"totalSavingsUsd": 36.77
/* ...existing fields preserved... */
}
/v1/intelligence/benchmark summary:
{
"summary": {
"period": "rolling",
"totalComparisons": 351,
"confident": true,
"pValue": 8.24e-41,
"effectSize": 1.08,
"method": "welch_t_test",
"meanRewardBandit": 0.991,
"meanRewardPrice": 0.9,
"winRateVsPrice": 0.772,
"rewardImprovementVsPricePct": 10.2,
"verdict": "thompson_beats_static_price"
}
/* ...existing fields preserved... */
}
The verdict field encodes the stat-test outcome as a single value (thompson_beats_static_price | static_price_beats_thompson | inconclusive) so callers don't need to interpret p-values directly.
The Numbers
- Endpoints updated: 2
- E2E tests added: SAV-004, BEN-004 (in
fortress-intelligence-apis.e2e.test.ts) - New script:
scripts/load-test-light.js— read-only k6 load test against/healthand/v1/models, p(95) <165ms at 50 VUs sustained / 129 req/s
Lockstep
- API routes:
src/api/capabilities/intelligence/intelligence.ts—intelligenceSavingsCapabilityandintelligenceBenchmarkCapabilityhandlers updated - SDK (TS): unchanged —
IntelligenceSavingsResponseandIntelligenceBenchmarkResponseare bothRecord, so.summaryis readable by existing clients without regeneration - SDK (Python): same —
_get()returnsdict/Mapping[str, Any] - MCP: not agent-facing tools
- Docs: this entry
- Tests: 2 e2e cases asserting
.summaryshape