Chat Completions

OpenAI-compatible chat completions with optional agentic mode.

POST /v1/chat/completions

Send a chat completion request. Fully compatible with the OpenAI API spec, with an additional mode parameter for agentic processing.

Request

curl https://api.brainstormrouter.com/v1/chat/completions \
  -H "Authorization: Bearer br_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is BrainstormRouter?"}
    ],
    "stream": true,
    "mode": "agentic"
  }'

Body parameters

ParameterTypeRequiredDescription
modelstringYesModel ID in provider/model format, or a strategy like auto
messagesarrayYesOpenAI-format message array
streambooleanNoEnable SSE streaming (default: false)
modestringNoSet to "agentic" to route through The Soul
temperaturenumberNoSampling temperature (0-2)
max_tokensnumberNoMaximum tokens to generate
userstringNoEnd-user identifier for tracking
routeobjectNoRouting controls — see Routing controls below
cachebooleanNoSet to false to bypass the prompt cache for this request
x_no_cachebooleanNoEquivalent to cache: false (legacy spelling)

Routing controls

Use route to influence how BR picks a model when you pass "model": "auto" or a virtual model, and to constrain which providers are eligible.

{
  "model": "auto",
  "messages": [...],
  "route": {
    "strategy": "quality",
    "providerAllow": ["anthropic", "openai"],
    "providerDeny": ["deepseek"],
    "maxCostPerRequest": 0.05,
    "models": ["anthropic/claude-sonnet-4-5", "openai/gpt-4.1"],
    "allowFallback": true
  }
}
FieldTypeValues
strategystring"price" · "latency" · "throughput" · "priority" · "quality" · "cascade"
providerAllowstring[]Restrict selection to these provider IDs
providerDenystring[]Exclude these provider IDs
maxCostPerRequestnumberReject a pick if its estimated cost exceeds this (USD)
modelsstring[]Restrict selection to this explicit model list
allowFallbackbooleanIf true, cascade to next provider on failure

Strategy semantics (defined in src/router/model-router-types.ts):

  • price — cheapest model that matches the request
  • latency — lowest p50 latency model
  • throughput — highest observed tokens-per-second
  • priority — tenant-configured provider preference order
  • quality — highest learned quality posterior (Thompson sampling)
  • cascade — try providers in order until one succeeds

The default when strategy is omitted is quality. If strategy is set to a value not in the enum above, the request is rejected with a validation error — there is no silent fallback.

Cache controls

BR applies a semantic prompt cache to non-streaming requests by default. Identical prompts within a tenant serve a cached response instead of calling the upstream provider.

Bypass per request:

{ "model": "...", "messages": [...], "cache": false }

or the header alternative:

Cache key. The key is derived from model, messages, tools, response_format, temperature, max_tokens, _and_ route/provider.strategy. Two requests with identical prompts but different routing hints (e.g., one with strategy: "price", another with strategy: "quality") are cached separately.

Streaming requests are never cached.

Proxy mode (default)

When mode is omitted, the request routes through the ModelRouter:

  1. Provider selected based on model ID, cost, latency, and health
  2. Circuit breaker checked — unhealthy providers are skipped
  3. Fallback chain activated if the primary provider fails
  4. Guardian estimates cost and enforces budget limits

Agentic mode

When mode is "agentic", the request routes through The Soul:

  1. Tenant's core memory loaded into the system prompt
  2. Agent execution loop processes the message with tool access
  3. Memory tools (core + archival) available for read/write
  4. Response returned in standard OpenAI completion format

Response

Standard OpenAI chat completion response with BR additions inside usage:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709000000,
  "model": "anthropic/claude-sonnet-4-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "BrainstormRouter is an autonomous AI API engine..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170,
    "cost_usd": 0.00042,
    "routing_strategy": "quality"
  }
}
Field (inside usage)Description
prompt_tokensOpenAI standard
completion_tokensOpenAI standard
total_tokensOpenAI standard
cost_usdBR extension — actual USD charged for this request
routing_strategyBR extension — strategy that selected the model (if set)

The cost_usd and routing_strategy fields are non-streaming only today. For streaming requests, read X-BR-Actual-Cost from the response headers.

Response headers

Every response includes Guardian intelligence headers:

HeaderDescription
X-BR-Guardian-Statusapproved, warned, or blocked
X-BR-Estimated-CostPre-execution cost estimate (USD)
X-BR-Actual-CostPost-execution actual cost (USD)
X-BR-EfficiencyCost efficiency percentage
X-BR-Guardian-Overhead-MsGuardian processing time
X-Model-Router-EndpointProvider endpoint used
X-Model-Router-Latency-MsProvider response latency
X-RateLimit-RemainingRemaining requests in the current window

Streaming

When stream: true, the response is delivered as Server-Sent Events:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"index":0}]}

data: [DONE]