Chat Completions
OpenAI-compatible chat completions with optional agentic mode.
POST /v1/chat/completions
Send a chat completion request. Fully compatible with the OpenAI API spec, with an additional mode parameter for agentic processing.
Request
curl https://api.brainstormrouter.com/v1/chat/completions \
-H "Authorization: Bearer br_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is BrainstormRouter?"}
],
"stream": true,
"mode": "agentic"
}'
Body parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID in provider/model format |
messages | array | Yes | OpenAI-format message array |
stream | boolean | No | Enable SSE streaming (default: false) |
mode | string | No | Set to "agentic" to route through The Soul |
temperature | number | No | Sampling temperature (0-2) |
max_tokens | number | No | Maximum tokens to generate |
user | string | No | End-user identifier for tracking |
Proxy mode (default)
When mode is omitted, the request routes through the ModelRouter:
- Provider selected based on model ID, cost, latency, and health
- Circuit breaker checked — unhealthy providers are skipped
- Fallback chain activated if the primary provider fails
- Guardian estimates cost and enforces budget limits
Agentic mode
When mode is "agentic", the request routes through The Soul:
- Tenant's core memory loaded into the system prompt
- Agent execution loop processes the message with tool access
- Memory tools (core + archival) available for read/write
- Response returned in standard OpenAI completion format
Response
Standard OpenAI chat completion response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709000000,
"model": "anthropic/claude-sonnet-4-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "BrainstormRouter is an autonomous AI API engine..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 128,
"total_tokens": 170
}
}
Response headers
Every response includes Guardian intelligence headers:
| Header | Description |
|---|---|
X-BR-Guardian-Status | approved, warned, or blocked |
X-BR-Estimated-Cost | Pre-execution cost estimate (USD) |
X-BR-Actual-Cost | Post-execution actual cost (USD) |
X-BR-Efficiency | Cost efficiency percentage |
X-BR-Guardian-Overhead-Ms | Guardian processing time |
X-Model-Router-Endpoint | Provider endpoint used |
X-Model-Router-Latency-Ms | Provider response latency |
X-RateLimit-Remaining | Remaining requests in the current window |
Streaming
When stream: true, the response is delivered as Server-Sent Events:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"index":0}]}
data: [DONE]