Circuit Breakers
Dual-trigger state machine with sliding window error rate and exponential backoff recovery.
Why circuit breakers matter for AI gateways
When a provider endpoint fails (rate limit, timeout, 500), a naive retry loop makes things worse — you're adding load to an already-failing service. Circuit breakers detect sustained failure and temporarily remove the endpoint from rotation, giving it time to recover.
BrainstormRouter's circuit breaker implementation is in src/router/circuit-breaker.ts.
State machine
%%{init: {'theme': 'dark', 'themeVariables': {'primaryColor': '#d97706', 'lineColor': '#9494a8', 'primaryTextColor': '#e8e8ee'}}}%%
stateDiagram-v2
[*] --> Closed
Closed --> Open: 5 consecutive failures\nOR 50% error rate (60s window)
Open --> HalfOpen: Reset timeout elapsed\n(base: 30s)
HalfOpen --> Closed: Probe succeeds\n(reset timeout to base)
HalfOpen --> Open: Probe fails\n(double timeout, cap 16×)
state Closed {
[*] --> Monitoring
Monitoring: All requests allowed
Monitoring: Recording success/failure
}
state Open {
[*] --> Blocking
Blocking: All requests rejected
Blocking: Waiting for timeout
}
state HalfOpen {
[*] --> Probing
Probing: Single request allowed
Probing: Testing recovery
}
Configuration
From circuit-breaker.ts:17-30:
export type CircuitBreakerConfig = {
failureThreshold: number; // Default: 5
resetTimeoutMs: number; // Default: 30000 (30s)
enableWindow?: boolean; // Default: true
windowMs?: number; // Default: 60000 (60s)
windowMinRequests?: number; // Default: 10
windowErrorRateThreshold?: number; // Default: 0.5 (50%)
};
Dual triggers
The circuit opens when either trigger fires (not both):
Trigger 1: Consecutive failures (circuit-breaker.ts:128-134)
if (circuit.failures >= this.config.failureThreshold) {
circuit.state = "open";
circuit.lastStateChange = Date.now();
log.warn(`Circuit opened for ${serviceId} (${circuit.failures} consecutive failures)`);
return;
}
Five consecutive failures (default) open the circuit immediately. A single success resets the counter. This catches hard failures — endpoint down, auth revoked, DNS failure.
Trigger 2: Sliding window error rate (circuit-breaker.ts:136-149)
if (this.config.enableWindow) {
const { total, errors } = this.windowStats(circuit);
if (total >= this.config.windowMinRequests) {
const errorRate = errors / total;
if (errorRate >= this.config.windowErrorRateThreshold) {
circuit.state = "open";
// ...
}
}
}
If 50%+ of requests in the last 60 seconds failed (with at minimum 10 requests in the window), the circuit opens. This catches soft failures — intermittent 429s, partial timeouts, degraded responses.
The sliding window uses a ring buffer of (timestamp, isError) tuples, pruned on every append to stay within the window duration.
Recovery: exponential backoff
When a circuit is open, it waits for the reset timeout (default 30s) before transitioning to half-open. In half-open state, a single request is allowed through as a probe.
Probe succeeds (circuit-breaker.ts:86-94): Circuit closes. Reset timeout returns to base value (30s). Normal operation resumes.
Probe fails (circuit-breaker.ts:110-122): Circuit re-opens. Reset timeout doubles (capped at 16× base = ~8 minutes at default 30s base).
// Failed probe — exponential backoff
circuit.currentResetTimeoutMs = Math.min(
circuit.currentResetTimeoutMs * 2,
this.config.resetTimeoutMs * 16, // Cap at 16× base
);
This prevents a flapping endpoint from consuming probe requests endlessly. The backoff sequence: 30s → 60s → 120s → 240s → 480s (cap).
Integration with routing
When the circuit breaker reports an endpoint as open, the router removes it from candidate selection. The fallback chain takes over:
- Thompson Sampling selects from eligible (non-open) endpoints
- If the selected endpoint fails, the fallback resolver tries the next endpoint
- If all endpoints are in circuit-open state, the request fails with routing metadata
The circuit breaker check (allowRequest()) is a single Map lookup + timestamp comparison — less than 0.1ms overhead.
Per-service granularity
Circuits are keyed by serviceId, which maps to a specific provider+model endpoint. Opening the circuit for anthropic:claude-sonnet-4:us-east-1 does NOT affect anthropic:claude-sonnet-4:eu-west-1 or openai:gpt-4o. Each endpoint is isolated.
Manual recovery
For operational emergencies, circuits can be manually reset:
circuitBreaker.reset("anthropic:claude-sonnet-4:us-east-1");
This immediately closes the circuit and clears all failure history.