BrainstormRouter vs Letta

Agent memory platform vs integrated gateway: where memory meets routing and cost optimization.

What Letta does well

Letta (formerly MemGPT) pioneered long-term memory for LLM agents:

Rich memory hierarchy: core memory (working), archival memory (long-term), recall memory (conversation)
Agent framework: full agent lifecycle with tool use
Open source: Apache 2.0, large community
Memory management: automatic summarization, tiered storage
Developer-friendly: Python SDK with good abstractions

Letta's strength is memory architecture. If you're building a single-agent application that needs to remember things across sessions, Letta provides the best primitives.

Where the architectures diverge

Letta is an agent framework with memory. BrainstormRouter is an API gateway with memory, routing, and governance. They solve different layers of the stack — but the memory capability overlaps.

Dimension	Letta	BrainstormRouter
Primary role	Agent framework	API gateway
Memory	Core + archival + recall	Core + archival (pgvector)
Routing	None (single provider)	Thompson Sampling across providers
Cost optimization	None	Guardian pre-request prediction + seatbelts
Multi-model	SDK configuration	Automatic selection + fallback chains
Circuit breakers	None	Dual-trigger per-endpoint state machine
Multi-tenancy	None	RLS + AsyncLocalStorage + ltree
Governance	None	Privacy modes, PII scanning, RBAC
Integration effort	New SDK required	`base_url` swap (OpenAI-compatible)
Background processing	Agent heartbeat	Memory extraction queue + heartbeat
Self-hosted	Yes	Yes

The integration difference

Using Letta

Letta requires its own SDK and agent abstraction:

from letta import create_client

client = create_client()
agent = client.create_agent(
    name="my-agent",
    llm_config=LLMConfig(model="gpt-4o", ...),
    memory=ChatMemory(human="User info", persona="Agent persona"),
)
response = client.send_message(agent_id=agent.id, message="Hello")

This is a different API from OpenAI. Adopting Letta means rewriting your LLM integration.

Using BrainstormRouter

from openai import OpenAI

client = OpenAI(
    base_url="https://api.brainstormrouter.com/v1",
    api_key="br_live_...",
)
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={"mode": "agentic"},  # enables memory
)

One line changes: base_url. The OpenAI SDK, LangChain, Vercel AI, or any OpenAI-compatible framework works without modification. Memory is activated by the mode: "agentic" flag.

The memory architecture

Letta memory

Letta maintains three memory tiers per agent:

Core memory: small, always in context (like a scratchpad)
Archival memory: vector-indexed long-term store
Recall memory: searchable conversation history

The agent manages its own memory through tool calls (core_memory_append, archival_memory_insert, archival_memory_search).

BrainstormRouter memory

BrainstormRouter uses two tiers — core and archival — with a key difference: memory extraction happens asynchronously via a Postgres durable queue.

From src/db/stores/memory-extraction-store.ts:69-84:

-- Concurrent workers claim jobs without blocking
SELECT id FROM memory_extraction_queue
WHERE status = 'pending' AND next_run_at <= $1
ORDER BY next_run_at ASC, created_at ASC
LIMIT $2
FOR UPDATE SKIP LOCKED

The extraction pipeline:

Completion response arrives
Gateway enqueues extraction job (returns response immediately)
Background worker claims batch with FOR UPDATE SKIP LOCKED
Pi Runner processes each job, extracting memories
Memories stored in pgvector for semantic retrieval

This means memory extraction never adds latency to the response.

What BrainstormRouter adds beyond memory

The key argument isn't that BrainstormRouter has better memory than Letta. It's that BrainstormRouter has memory plus everything else a production deployment needs:

Routing intelligence

Letta uses one provider. BrainstormRouter routes across all of them:

// Thompson Sampling selects the best model from measured performance
const selection = bandit.select(arms);
// selection.method: "ucb1" | "thompson" | "random"
// selection.modelKey: "anthropic/claude-sonnet-4"

Cost governance

Letta has no cost awareness. BrainstormRouter predicts cost before the call:

X-BR-Actual-Cost: $0.028
X-BR-Efficiency: 0.87
X-BR-Guardian-Overhead-Ms: 1.2

Tenant isolation

Letta is single-tenant. BrainstormRouter uses Postgres RLS for database-level isolation:

// From src/db/schema/tenants.ts:100
const tenantRlsUsing = sql`tenant_id = current_setting('app.current_tenant')::uuid`;

Failure resilience

Letta fails if the provider fails. BrainstormRouter reroutes:

// Circuit breaker removes failing endpoints; fallback takes over
if (circuit.failures >= this.config.failureThreshold) {
  circuit.state = "open";
}

When to choose Letta

You're building a single-agent application
Memory is the primary feature you need
You want fine-grained control over memory management
You're comfortable with a Letta-specific SDK
You don't need multi-provider routing or cost optimization

When to choose BrainstormRouter

You need memory AND routing AND governance
You want OpenAI-compatible integration (no new SDK)
You need multi-tenant isolation for your customers
You want cost prediction and efficiency scoring per request
You need circuit breakers and automatic failover
You're running multi-model, multi-provider infrastructure