Chat Completions

OpenAI-compatible chat completions with optional agentic mode.

POST /v1/chat/completions

Send a chat completion request. Fully compatible with the OpenAI API spec, with an additional mode parameter for agentic processing.

Request

curl https://api.brainstormrouter.com/v1/chat/completions \
  -H "Authorization: Bearer br_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is BrainstormRouter?"}
    ],
    "stream": true,
    "mode": "agentic"
  }'

Body parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID in `provider/model` format, or a strategy like `auto`
`messages`	array	Yes	OpenAI-format message array
`stream`	boolean	No	Enable SSE streaming (default: `false`)
`mode`	string	No	Set to `"agentic"` to route through The Soul
`temperature`	number	No	Sampling temperature (0-2)
`max_tokens`	number	No	Maximum tokens to generate
`user`	string	No	End-user identifier for tracking
`route`	object	No	Routing controls — see Routing controls below
`cache`	boolean	No	Set to `false` to bypass the prompt cache for this request
`x_no_cache`	boolean	No	Equivalent to `cache: false` (legacy spelling)

Routing controls

Use route to influence how BR picks a model when you pass "model": "auto" or a virtual model, and to constrain which providers are eligible.

{
  "model": "auto",
  "messages": [...],
  "route": {
    "strategy": "quality",
    "providerAllow": ["anthropic", "openai"],
    "providerDeny": ["deepseek"],
    "maxCostPerRequest": 0.05,
    "models": ["anthropic/claude-sonnet-4-5", "openai/gpt-4.1"],
    "allowFallback": true
  }
}

Field	Type	Values
`strategy`	string	`"price"` · `"latency"` · `"throughput"` · `"priority"` · `"quality"` · `"cascade"`
`providerAllow`	`string[]`	Restrict selection to these provider IDs
`providerDeny`	`string[]`	Exclude these provider IDs
`maxCostPerRequest`	number	Reject a pick if its estimated cost exceeds this (USD)
`models`	`string[]`	Restrict selection to this explicit model list
`allowFallback`	boolean	If `true`, cascade to next provider on failure

Strategy semantics (defined in src/router/model-router-types.ts):

price — cheapest model that matches the request
latency — lowest p50 latency model
throughput — highest observed tokens-per-second
priority — tenant-configured provider preference order
quality — highest learned quality posterior (Thompson sampling)
cascade — try providers in order until one succeeds

The default when strategy is omitted is quality. If strategy is set to a value not in the enum above, the request is rejected with a validation error — there is no silent fallback.

Cache controls

BR applies a semantic prompt cache to non-streaming requests by default. Identical prompts within a tenant serve a cached response instead of calling the upstream provider.

Bypass per request:

{ "model": "...", "messages": [...], "cache": false }

or the header alternative:

Cache key. The key is derived from model, messages, tools, response_format, temperature, max_tokens, _and_ route/provider.strategy. Two requests with identical prompts but different routing hints (e.g., one with strategy: "price", another with strategy: "quality") are cached separately.

Streaming requests are never cached.

Proxy mode (default)

When mode is omitted, the request routes through the ModelRouter:

Provider selected based on model ID, cost, latency, and health
Circuit breaker checked — unhealthy providers are skipped
Fallback chain activated if the primary provider fails
Guardian estimates cost and enforces budget limits

Agentic mode

When mode is "agentic", the request routes through The Soul:

Tenant's core memory loaded into the system prompt
Agent execution loop processes the message with tool access
Memory tools (core + archival) available for read/write
Response returned in standard OpenAI completion format

Response

Standard OpenAI chat completion response with BR additions inside usage:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709000000,
  "model": "anthropic/claude-sonnet-4-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "BrainstormRouter is an autonomous AI API engine..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170,
    "cost_usd": 0.00042,
    "routing_strategy": "quality"
  }
}

Field (inside `usage`)	Description
`prompt_tokens`	OpenAI standard
`completion_tokens`	OpenAI standard
`total_tokens`	OpenAI standard
`cost_usd`	BR extension — actual USD charged for this request
`routing_strategy`	BR extension — strategy that selected the model (if set)

The cost_usd and routing_strategy fields are non-streaming only today. For streaming requests, read X-BR-Actual-Cost from the response headers.

Response headers

Every response includes Guardian intelligence headers:

Header	Description
`X-BR-Guardian-Status`	`approved`, `warned`, or `blocked`
`X-BR-Estimated-Cost`	Pre-execution cost estimate (USD)
`X-BR-Actual-Cost`	Post-execution actual cost (USD)
`X-BR-Efficiency`	Cost efficiency percentage
`X-BR-Guardian-Overhead-Ms`	Guardian processing time
`X-Model-Router-Endpoint`	Provider endpoint used
`X-Model-Router-Latency-Ms`	Provider response latency
`X-RateLimit-Remaining`	Remaining requests in the current window

Streaming

When stream: true, the response is delivered as Server-Sent Events:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"index":0}]}

data: [DONE]