LlamaIndex
Use BrainstormRouter as the LLM provider for LlamaIndex RAG pipelines.
Setup
pip install llama-index-llms-openai-like
Configuration
from llama_index.llms.openai_like import OpenAILike
llm = OpenAILike(
api_base="https://api.brainstormrouter.com/v1",
api_key="br_live_...",
model="anthropic/claude-sonnet-4",
is_chat_model=True,
)
RAG with BrainstormRouter
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Load documents
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents, llm=llm)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is the main architecture pattern?")
print(response)
Streaming
from llama_index.core import Settings
Settings.llm = llm
query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("Summarize the key findings")
for token in response.response_gen:
print(token, end="", flush=True)
Memory-augmented RAG
Combine LlamaIndex's document retrieval with BrainstormRouter's persistent memory for a hybrid approach:
- LlamaIndex handles document chunking and vector retrieval
- BrainstormRouter memory stores cross-session user preferences and context
- Agentic mode lets the model access both sources
import requests
# Store RAG-discovered insights in persistent memory
requests.post(
"https://api.brainstormrouter.com/v1/memory/entries",
headers={"Authorization": "Bearer br_live_..."},
json={
"fact": "Architecture uses event-driven microservices with Kafka",
"block": "project",
},
)
Cost optimization
Route different pipeline stages through different models:
# Expensive: synthesis and reasoning
synthesis_llm = OpenAILike(
api_base="https://api.brainstormrouter.com/v1",
api_key="br_live_...",
model="anthropic/claude-sonnet-4:best",
is_chat_model=True,
)
# Cheap: summarization and extraction
extraction_llm = OpenAILike(
api_base="https://api.brainstormrouter.com/v1",
api_key="br_live_...",
model="openai/gpt-4o-mini:floor",
is_chat_model=True,
)