Chat Completions

OpenAI-compatible chat completions with optional agentic mode.

POST /v1/chat/completions

Send a chat completion request. Fully compatible with the OpenAI API spec, with an additional mode parameter for agentic processing.

Request

curl https://api.brainstormrouter.com/v1/chat/completions \
  -H "Authorization: Bearer br_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is BrainstormRouter?"}
    ],
    "stream": true,
    "mode": "agentic"
  }'

Body parameters

ParameterTypeRequiredDescription
modelstringYesModel ID in provider/model format
messagesarrayYesOpenAI-format message array
streambooleanNoEnable SSE streaming (default: false)
modestringNoSet to "agentic" to route through The Soul
temperaturenumberNoSampling temperature (0-2)
max_tokensnumberNoMaximum tokens to generate
userstringNoEnd-user identifier for tracking

Proxy mode (default)

When mode is omitted, the request routes through the ModelRouter:

  1. Provider selected based on model ID, cost, latency, and health
  2. Circuit breaker checked — unhealthy providers are skipped
  3. Fallback chain activated if the primary provider fails
  4. Guardian estimates cost and enforces budget limits

Agentic mode

When mode is "agentic", the request routes through The Soul:

  1. Tenant's core memory loaded into the system prompt
  2. Agent execution loop processes the message with tool access
  3. Memory tools (core + archival) available for read/write
  4. Response returned in standard OpenAI completion format

Response

Standard OpenAI chat completion response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709000000,
  "model": "anthropic/claude-sonnet-4-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "BrainstormRouter is an autonomous AI API engine..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170
  }
}

Response headers

Every response includes Guardian intelligence headers:

HeaderDescription
X-BR-Guardian-Statusapproved, warned, or blocked
X-BR-Estimated-CostPre-execution cost estimate (USD)
X-BR-Actual-CostPost-execution actual cost (USD)
X-BR-EfficiencyCost efficiency percentage
X-BR-Guardian-Overhead-MsGuardian processing time
X-Model-Router-EndpointProvider endpoint used
X-Model-Router-Latency-MsProvider response latency
X-RateLimit-RemainingRemaining requests in the current window

Streaming

When stream: true, the response is delivered as Server-Sent Events:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"index":0}]}

data: [DONE]