Agent Memory (RMM)

Persistent, tenant-wide memory with 4-block core memory, archival storage, and keyword search.

Overview

BrainstormRouter provides tenant-wide persistent memory using a Relational Memory Model (RMM) inspired by the Letta architecture. Every agentic request shares the same memory — no conversation IDs or history arrays needed.

Memory is stored in per-tenant SQLite databases with two systems:

  • Core memory: A small, always-loaded scratchpad of key facts organized into 4 blocks
  • Archival memory: A vector-indexed long-term store backed by Postgres pgvector

Core memory blocks

Core memory is divided into four blocks, each with a default entry limit:

BlockDefault LimitPurpose
human15User preferences, identity, communication style
system10Infrastructure, deployment, environment details
project15Project name, tech stack, architecture decisions
general15Anything else worth remembering

Entries can be pinned to prevent eviction during memory compaction.

How agents use memory

When a request runs in agentic mode, The Soul has access to 7 built-in memory tools:

ToolDescription
core_memory_appendAdd a fact to a specific block
core_memory_replaceUpdate an existing fact by ID
core_memory_deleteRemove a fact by ID
core_memory_pinPin a fact (prevents eviction)
core_memory_unpinUnpin a previously pinned fact
archival_memory_insertStore text in the vector archive
archival_memory_searchSemantic search across archived entries

The agent decides when to read and write memory autonomously — these tools are called as part of the agentic response, not by your application code.

Tenant-wide scoping

Memory is scoped to your tenant, not to individual conversations or API keys:

# Request 1: teach it something
client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Our deploy target is us-east-1 on ECS Fargate."}],
    extra_body={"mode": "agentic"},
)

# Request 2 (hours later): it remembers
client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Where do we deploy?"}],
    extra_body={"mode": "agentic"},
)
# → "You deploy to us-east-1 on ECS Fargate."

Session isolation

When BR_MEMORY_SESSION_ISOLATION=1 is set, agent-JWT callers must include a sid (session ID) claim. Memory operations are then scoped to that session within the tenant. API-key callers are unaffected and always see the full tenant memory.

REST API

All endpoints require API key authentication (Authorization: Bearer br_live_...).

Blocks

MethodEndpointDescription
GET/v1/memory/blocksList all blocks with entry counts
GET/v1/memory/blocks/:blockList entries in a specific block

Entries

MethodEndpointDescription
GET/v1/memory/entriesList all core memory entries
POST/v1/memory/entriesAppend a new entry (fact, block, pinned)
PUT/v1/memory/entries/:idUpdate entry text and/or pin status
DELETE/v1/memory/entries/:idDelete an entry

Query

MethodEndpointDescription
POST/v1/memory/queryKeyword search across entries

Init

MethodEndpointDescription
POST/v1/memory/initBootstrap memory from context docs

SDK usage

TypeScript

import BrainstormRouter from "brainstormrouter";

const client = new BrainstormRouter({ apiKey: "br_live_..." });

// List blocks
const { blocks } = await client.memory.blocks();

// List all entries
const { entries } = await client.memory.entries();

// Append a fact
await client.memory.append("Deploy target is us-east-1 on ECS Fargate", {
  block: "system",
});

// Update a fact
await client.memory.update("entry-id", {
  fact: "Deploy target is us-west-2 on ECS Fargate",
});

// Pin an entry
await client.memory.update("entry-id", { pinned: true });

// Search by keyword
const { results } = await client.memory.query("deploy");
// results: [{ entry: { id, fact, block, pinned }, score: 0.85 }]

// Delete
await client.memory.remove("entry-id");

Python

from brainstormrouter import BrainstormRouter

client = BrainstormRouter(api_key="br_live_...")

# List blocks
blocks = client.memory.blocks()

# List all entries
entries = client.memory.entries()

# Append a fact
client.memory.append(fact="Deploy target is ECS Fargate", block="system")

# Update a fact
client.memory.update("entry-id", fact="Deploy target is us-west-2")

# Pin an entry
client.memory.update("entry-id", pinned=True)

# Search by keyword
results = client.memory.query("deploy")

# Delete
client.memory.remove("entry-id")

MCP tools

The MCP server exposes 5 memory tools for AI agents connecting via the Model Context Protocol:

ToolPermissionDescription
br_memory_listmemory.readList all core memory entries
br_memory_storememory.writeStore a new memory entry
br_memory_querymemory.readSearch entries by keyword relevance
br_memory_deletememory.writeDelete an entry by ID
br_memory_updatememory.writeUpdate entry text and/or pin status

Asynchronous extraction

Memory extraction doesn't happen inline — it would add too much latency. After a completion returns, the gateway enqueues an extraction job using a Postgres-backed durable job queue with FOR UPDATE SKIP LOCKED for concurrent worker processing. The response is returned immediately; background workers claim and process jobs asynchronously.

Storage and isolation

Core memory is stored in per-tenant SQLite databases at /tenants//memory/rmm.db. Archival memory uses Postgres with pgvector for vector indexing.

Tenant isolation is enforced at the filesystem level (separate SQLite files per tenant) for core memory, and by Row-Level Security policies for archival memory in Postgres. There is no cross-tenant data access.

What's shipped vs. planned

Shipped:

  • Core memory CRUD (append, replace, delete, pin/unpin) via API, SDKs, MCP, and dashboard
  • Keyword search/query via API, SDKs, MCP, and dashboard
  • 7 agent tools for autonomous memory management
  • Memory init (bootstrap from context documents)
  • Per-tenant SQLite isolation
  • Session isolation flag (BR_MEMORY_SESSION_ISOLATION)
  • Async extraction queue

Planned:

  • Archival memory search as a REST endpoint (currently agent-only via tool)
  • Cross-system unification (core RMM + Postgres chunks into a single query surface)
  • Memory provenance and mutation audit trail