Cold-start timing wrapper dropped short-circuit Responses (5/14 5xx)
2026-05-06
LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: ["none — internal middleware"] sdk_methods_updated: ["none"] mcp_tools_updated: ["none"] ---
What We Built
Fixed a latent bug in timedMiddleware (the cold-start timing wrapper around every /v1/* middleware) that was dropping short-circuit Response objects during the instrumented window. The wrapper awaited the wrapped middleware's promise but didn't return it, so when a wrapped middleware (e.g. apiKeyAuth returning a 401) short-circuited with a Response, that response was silently discarded. Hono then continued the chain against an unauthenticated context, eventually throwing RangeError when a downstream handler tried to construct a Response with an invalid status code.
Why It Matters
Five of fourteen 5xx errors in a one-hour production sample on 2026-05-06 traced to this single line. Because the bug only fired during the cold-start instrumentation window (first 3 requests after task boot, then dormant), and ECS keeps ≥2 tasks behind the ALB with frequent deploys, the window kept reopening — surfacing as a steady trickle of unexplained 5xx rather than an obvious spike.
How It Works
The dormant branch was already correct (return mw(c, next) returns the promise that resolves to the Response). Only the instrumented branch dropped the result:
// before
const result = mw(c, next);
await result;
timings[name] = performance.now() - start;
// after
try {
return await mw(c, next);
} finally {
timings[name] = performance.now() - start;
c.set("_stageTimings", timings);
}
finally keeps timing capture intact when the wrapped middleware throws.
Added a test-only __resetForTesting() export because requestCount is module-level state shared across vitest worker test files, which would otherwise push the counter past the threshold and force the dormant branch in any test that exercises the instrumented path.
The Numbers
- Failure mode: 5/14 5xx in 1h prod sample (~36%)
- Trigger: only first 3 requests after each ECS task boot
- Tests added: 3 (short-circuit propagation, pass-through, finally-block timing capture on throw)
- Existing prod evidence:
RangeError: invalid status codetraces in CW Logs immediately after task boot
Lockstep Checklist
- [x] API Routes: No API surface change — internal middleware bug fix.
- [x] TS SDK: No SDK surface change.
- [x] Python SDK: No SDK surface change.
- [x] MCP Schemas: No MCP surface change.
- [x] Master Record: No capability change.