BrainstormRouter vs Letta

Agent memory platform vs integrated gateway: where memory meets routing and cost optimization.

What Letta does well

Letta (formerly MemGPT) pioneered long-term memory for LLM agents:

  • Rich memory hierarchy: core memory (working), archival memory (long-term), recall memory (conversation)
  • Agent framework: full agent lifecycle with tool use
  • Open source: Apache 2.0, large community
  • Memory management: automatic summarization, tiered storage
  • Developer-friendly: Python SDK with good abstractions

Letta's strength is memory architecture. If you're building a single-agent application that needs to remember things across sessions, Letta provides the best primitives.

Where the architectures diverge

Letta is an agent framework with memory. BrainstormRouter is an API gateway with memory, routing, and governance. They solve different layers of the stack — but the memory capability overlaps.

DimensionLettaBrainstormRouter
Primary roleAgent frameworkAPI gateway
MemoryCore + archival + recallCore + archival (pgvector)
RoutingNone (single provider)Thompson Sampling across providers
Cost optimizationNoneGuardian pre-request prediction + seatbelts
Multi-modelSDK configurationAutomatic selection + fallback chains
Circuit breakersNoneDual-trigger per-endpoint state machine
Multi-tenancyNoneRLS + AsyncLocalStorage + ltree
GovernanceNonePrivacy modes, PII scanning, RBAC
Integration effortNew SDK requiredbase_url swap (OpenAI-compatible)
Background processingAgent heartbeatMemory extraction queue + heartbeat
Self-hostedYesYes

The integration difference

Using Letta

Letta requires its own SDK and agent abstraction:

from letta import create_client

client = create_client()
agent = client.create_agent(
    name="my-agent",
    llm_config=LLMConfig(model="gpt-4o", ...),
    memory=ChatMemory(human="User info", persona="Agent persona"),
)
response = client.send_message(agent_id=agent.id, message="Hello")

This is a different API from OpenAI. Adopting Letta means rewriting your LLM integration.

Using BrainstormRouter

from openai import OpenAI

client = OpenAI(
    base_url="https://api.brainstormrouter.com/v1",
    api_key="br_live_...",
)
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={"mode": "agentic"},  # enables memory
)

One line changes: base_url. The OpenAI SDK, LangChain, Vercel AI, or any OpenAI-compatible framework works without modification. Memory is activated by the mode: "agentic" flag.

The memory architecture

Letta memory

Letta maintains three memory tiers per agent:

  • Core memory: small, always in context (like a scratchpad)
  • Archival memory: vector-indexed long-term store
  • Recall memory: searchable conversation history

The agent manages its own memory through tool calls (core_memory_append, archival_memory_insert, archival_memory_search).

BrainstormRouter memory

BrainstormRouter uses two tiers — core and archival — with a key difference: memory extraction happens asynchronously via a Postgres durable queue.

From src/db/stores/memory-extraction-store.ts:69-84:

-- Concurrent workers claim jobs without blocking
SELECT id FROM memory_extraction_queue
WHERE status = 'pending' AND next_run_at <= $1
ORDER BY next_run_at ASC, created_at ASC
LIMIT $2
FOR UPDATE SKIP LOCKED

The extraction pipeline:

  1. Completion response arrives
  2. Gateway enqueues extraction job (returns response immediately)
  3. Background worker claims batch with FOR UPDATE SKIP LOCKED
  4. Pi Runner processes each job, extracting memories
  5. Memories stored in pgvector for semantic retrieval

This means memory extraction never adds latency to the response.

What BrainstormRouter adds beyond memory

The key argument isn't that BrainstormRouter has better memory than Letta. It's that BrainstormRouter has memory plus everything else a production deployment needs:

Routing intelligence

Letta uses one provider. BrainstormRouter routes across all of them:

// Thompson Sampling selects the best model from measured performance
const selection = bandit.select(arms);
// selection.method: "ucb1" | "thompson" | "random"
// selection.modelKey: "anthropic/claude-sonnet-4"

Cost governance

Letta has no cost awareness. BrainstormRouter predicts cost before the call:

X-BR-Actual-Cost: $0.028
X-BR-Efficiency: 0.87
X-BR-Guardian-Overhead-Ms: 1.2

Tenant isolation

Letta is single-tenant. BrainstormRouter uses Postgres RLS for database-level isolation:

// From src/db/schema/tenants.ts:100
const tenantRlsUsing = sql`tenant_id = current_setting('app.current_tenant')::uuid`;

Failure resilience

Letta fails if the provider fails. BrainstormRouter reroutes:

// Circuit breaker removes failing endpoints; fallback takes over
if (circuit.failures >= this.config.failureThreshold) {
  circuit.state = "open";
}

When to choose Letta

  • You're building a single-agent application
  • Memory is the primary feature you need
  • You want fine-grained control over memory management
  • You're comfortable with a Letta-specific SDK
  • You don't need multi-provider routing or cost optimization

When to choose BrainstormRouter

  • You need memory AND routing AND governance
  • You want OpenAI-compatible integration (no new SDK)
  • You need multi-tenant isolation for your customers
  • You want cost prediction and efficiency scoring per request
  • You need circuit breakers and automatic failover
  • You're running multi-model, multi-provider infrastructure