LlamaIndex

Use BrainstormRouter as the LLM provider for LlamaIndex RAG pipelines.

Setup

pip install llama-index-llms-openai-like

Configuration

from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
    api_base="https://api.brainstormrouter.com/v1",
    api_key="br_live_...",
    model="anthropic/claude-sonnet-4",
    is_chat_model=True,
)

RAG with BrainstormRouter

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load documents
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents, llm=llm)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is the main architecture pattern?")
print(response)

Streaming

from llama_index.core import Settings

Settings.llm = llm

query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("Summarize the key findings")

for token in response.response_gen:
    print(token, end="", flush=True)

Memory-augmented RAG

Combine LlamaIndex's document retrieval with BrainstormRouter's persistent memory for a hybrid approach:

  1. LlamaIndex handles document chunking and vector retrieval
  2. BrainstormRouter memory stores cross-session user preferences and context
  3. Agentic mode lets the model access both sources
import requests

# Store RAG-discovered insights in persistent memory
requests.post(
    "https://api.brainstormrouter.com/v1/memory/entries",
    headers={"Authorization": "Bearer br_live_..."},
    json={
        "fact": "Architecture uses event-driven microservices with Kafka",
        "block": "project",
    },
)

Cost optimization

Route different pipeline stages through different models:

# Expensive: synthesis and reasoning
synthesis_llm = OpenAILike(
    api_base="https://api.brainstormrouter.com/v1",
    api_key="br_live_...",
    model="anthropic/claude-sonnet-4:best",
    is_chat_model=True,
)

# Cheap: summarization and extraction
extraction_llm = OpenAILike(
    api_base="https://api.brainstormrouter.com/v1",
    api_key="br_live_...",
    model="openai/gpt-4o-mini:floor",
    is_chat_model=True,
)