BrainstormRouter vs Letta
Agent memory platform vs integrated gateway: where memory meets routing and cost optimization.
What Letta does well
Letta (formerly MemGPT) pioneered long-term memory for LLM agents:
- Rich memory hierarchy: core memory (working), archival memory (long-term), recall memory (conversation)
- Agent framework: full agent lifecycle with tool use
- Open source: Apache 2.0, large community
- Memory management: automatic summarization, tiered storage
- Developer-friendly: Python SDK with good abstractions
Letta's strength is memory architecture. If you're building a single-agent application that needs to remember things across sessions, Letta provides the best primitives.
Where the architectures diverge
Letta is an agent framework with memory. BrainstormRouter is an API gateway with memory, routing, and governance. They solve different layers of the stack — but the memory capability overlaps.
| Dimension | Letta | BrainstormRouter |
|---|---|---|
| Primary role | Agent framework | API gateway |
| Memory | Core + archival + recall | Core + archival (pgvector) |
| Routing | None (single provider) | Thompson Sampling across providers |
| Cost optimization | None | Guardian pre-request prediction + seatbelts |
| Multi-model | SDK configuration | Automatic selection + fallback chains |
| Circuit breakers | None | Dual-trigger per-endpoint state machine |
| Multi-tenancy | None | RLS + AsyncLocalStorage + ltree |
| Governance | None | Privacy modes, PII scanning, RBAC |
| Integration effort | New SDK required | base_url swap (OpenAI-compatible) |
| Background processing | Agent heartbeat | Memory extraction queue + heartbeat |
| Self-hosted | Yes | Yes |
The integration difference
Using Letta
Letta requires its own SDK and agent abstraction:
from letta import create_client
client = create_client()
agent = client.create_agent(
name="my-agent",
llm_config=LLMConfig(model="gpt-4o", ...),
memory=ChatMemory(human="User info", persona="Agent persona"),
)
response = client.send_message(agent_id=agent.id, message="Hello")
This is a different API from OpenAI. Adopting Letta means rewriting your LLM integration.
Using BrainstormRouter
from openai import OpenAI
client = OpenAI(
base_url="https://api.brainstormrouter.com/v1",
api_key="br_live_...",
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[{"role": "user", "content": "Hello"}],
extra_body={"mode": "agentic"}, # enables memory
)
One line changes: base_url. The OpenAI SDK, LangChain, Vercel AI, or any OpenAI-compatible framework works without modification. Memory is activated by the mode: "agentic" flag.
The memory architecture
Letta memory
Letta maintains three memory tiers per agent:
- Core memory: small, always in context (like a scratchpad)
- Archival memory: vector-indexed long-term store
- Recall memory: searchable conversation history
The agent manages its own memory through tool calls (core_memory_append, archival_memory_insert, archival_memory_search).
BrainstormRouter memory
BrainstormRouter uses two tiers — core and archival — with a key difference: memory extraction happens asynchronously via a Postgres durable queue.
From src/db/stores/memory-extraction-store.ts:69-84:
-- Concurrent workers claim jobs without blocking
SELECT id FROM memory_extraction_queue
WHERE status = 'pending' AND next_run_at <= $1
ORDER BY next_run_at ASC, created_at ASC
LIMIT $2
FOR UPDATE SKIP LOCKED
The extraction pipeline:
- Completion response arrives
- Gateway enqueues extraction job (returns response immediately)
- Background worker claims batch with
FOR UPDATE SKIP LOCKED - Pi Runner processes each job, extracting memories
- Memories stored in pgvector for semantic retrieval
This means memory extraction never adds latency to the response.
What BrainstormRouter adds beyond memory
The key argument isn't that BrainstormRouter has better memory than Letta. It's that BrainstormRouter has memory plus everything else a production deployment needs:
Routing intelligence
Letta uses one provider. BrainstormRouter routes across all of them:
// Thompson Sampling selects the best model from measured performance
const selection = bandit.select(arms);
// selection.method: "ucb1" | "thompson" | "random"
// selection.modelKey: "anthropic/claude-sonnet-4"
Cost governance
Letta has no cost awareness. BrainstormRouter predicts cost before the call:
X-BR-Actual-Cost: $0.028
X-BR-Efficiency: 0.87
X-BR-Guardian-Overhead-Ms: 1.2
Tenant isolation
Letta is single-tenant. BrainstormRouter uses Postgres RLS for database-level isolation:
// From src/db/schema/tenants.ts:100
const tenantRlsUsing = sql`tenant_id = current_setting('app.current_tenant')::uuid`;
Failure resilience
Letta fails if the provider fails. BrainstormRouter reroutes:
// Circuit breaker removes failing endpoints; fallback takes over
if (circuit.failures >= this.config.failureThreshold) {
circuit.state = "open";
}
When to choose Letta
- You're building a single-agent application
- Memory is the primary feature you need
- You want fine-grained control over memory management
- You're comfortable with a Letta-specific SDK
- You don't need multi-provider routing or cost optimization
When to choose BrainstormRouter
- You need memory AND routing AND governance
- You want OpenAI-compatible integration (no new SDK)
- You need multi-tenant isolation for your customers
- You want cost prediction and efficiency scoring per request
- You need circuit breakers and automatic failover
- You're running multi-model, multi-provider infrastructure