Intelligence endpoints expose top-level .summary envelope (R25 probe-null fix)

2026-05-14

apirouterintelligence

What We Built

GET /v1/intelligence/savings and GET /v1/intelligence/benchmark now return a top-level summary object that captures the headline metrics (totalSavingsUsd, trackedRequests, pathCount, source, verdict, p-value, win-rates, etc.) without forcing a caller to walk nested structures.

The existing field layout is preserved — all totalSavingsUsd, bySource, byPath, period, strategies_compared, thompson_advantage keys remain at the same paths. The summary field is purely additive.

Why It Matters

The R25 stochastic assessment recorded 8/10 agents flagging "null to jq '.summary' probe" against both endpoints. The endpoints had rich data; they just didn't expose a stable headline-metrics envelope under a predictable key. Agents probing endpoints with a defaulted .summary selector saw null and counted it as a missing capability.

Adding .summary (a) makes the API more agent-friendly — a single key returns the headline KPIs, (b) gives dashboards a stable shape to render without parsing edge cases (e.g. verdict: "inconclusive" when sample size is too small for stat-significance), and (c) closes a recurring assessment finding without changing the underlying data.

How It Works

/v1/intelligence/savings summary:

{
  "summary": {
    "totalSavingsUsd": 36.77,
    "trackedRequests": 5277,
    "avgSavingsPerRequest": 0.00697,
    "topSource": { "id": "cost_optimizer", "savingsUsd": 36.12, "requests": 5226 },
    "sources": [
      /* sorted desc by savings */
    ],
    "pathCount": 53,
    "periodStartIso": "2026-04-14T16:13:12.665Z",
    "periodEndIso": "2026-05-14T16:13:12.665Z",
    "source": "postgres"
  },
  "totalSavingsUsd": 36.77
  /* ...existing fields preserved... */
}

/v1/intelligence/benchmark summary:

{
  "summary": {
    "period": "rolling",
    "totalComparisons": 351,
    "confident": true,
    "pValue": 8.24e-41,
    "effectSize": 1.08,
    "method": "welch_t_test",
    "meanRewardBandit": 0.991,
    "meanRewardPrice": 0.9,
    "winRateVsPrice": 0.772,
    "rewardImprovementVsPricePct": 10.2,
    "verdict": "thompson_beats_static_price"
  }
  /* ...existing fields preserved... */
}

The verdict field encodes the stat-test outcome as a single value (thompson_beats_static_price | static_price_beats_thompson | inconclusive) so callers don't need to interpret p-values directly.

The Numbers

Endpoints updated: 2
E2E tests added: SAV-004, BEN-004 (in fortress-intelligence-apis.e2e.test.ts)
New script: scripts/load-test-light.js — read-only k6 load test against /health and /v1/models, p(95) <165ms at 50 VUs sustained / 129 req/s

Lockstep

API routes: src/api/capabilities/intelligence/intelligence.ts — intelligenceSavingsCapability and intelligenceBenchmarkCapability handlers updated
SDK (TS): unchanged — IntelligenceSavingsResponse and IntelligenceBenchmarkResponse are both Record, so .summary is readable by existing clients without regeneration
SDK (Python): same — _get() returns dict / Mapping[str, Any]
MCP: not agent-facing tools
Docs: this entry
Tests: 2 e2e cases asserting .summary shape