Agent Memory (RMM)

Persistent, tenant-wide memory with 4-block core memory, archival storage, and keyword search.

Overview

BrainstormRouter provides tenant-wide persistent memory using a Relational Memory Model (RMM) inspired by the Letta architecture. Every agentic request shares the same memory — no conversation IDs or history arrays needed.

Memory is stored in per-tenant SQLite databases with two systems:

Core memory: A small, always-loaded scratchpad of key facts organized into 4 blocks
Archival memory: A vector-indexed long-term store backed by Postgres pgvector

Core memory blocks

Core memory is divided into four blocks, each with a default entry limit:

Block	Default Limit	Purpose
`human`	15	User preferences, identity, communication style
`system`	10	Infrastructure, deployment, environment details
`project`	15	Project name, tech stack, architecture decisions
`general`	15	Anything else worth remembering

Entries can be pinned to prevent eviction during memory compaction.

How agents use memory

When a request runs in agentic mode, The Soul has access to 7 built-in memory tools:

Tool	Description
`core_memory_append`	Add a fact to a specific block
`core_memory_replace`	Update an existing fact by ID
`core_memory_delete`	Remove a fact by ID
`core_memory_pin`	Pin a fact (prevents eviction)
`core_memory_unpin`	Unpin a previously pinned fact
`archival_memory_insert`	Store text in the vector archive
`archival_memory_search`	Semantic search across archived entries

The agent decides when to read and write memory autonomously — these tools are called as part of the agentic response, not by your application code.

Tenant-wide scoping

Memory is scoped to your tenant, not to individual conversations or API keys:

# Request 1: teach it something
client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Our deploy target is us-east-1 on ECS Fargate."}],
    extra_body={"mode": "agentic"},
)

# Request 2 (hours later): it remembers
client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Where do we deploy?"}],
    extra_body={"mode": "agentic"},
)
# → "You deploy to us-east-1 on ECS Fargate."

Session isolation

When BR_MEMORY_SESSION_ISOLATION=1 is set, agent-JWT callers must include a sid (session ID) claim. Memory operations are then scoped to that session within the tenant. API-key callers are unaffected and always see the full tenant memory.

REST API

All endpoints require API key authentication (Authorization: Bearer br_live_...).

Blocks

Method	Endpoint	Description
GET	`/v1/memory/blocks`	List all blocks with entry counts
GET	`/v1/memory/blocks/:block`	List entries in a specific block

Entries

Method	Endpoint	Description
GET	`/v1/memory/entries`	List all core memory entries
POST	`/v1/memory/entries`	Append a new entry (`fact`, `block`, `pinned`)
PUT	`/v1/memory/entries/:id`	Update entry text and/or pin status
DELETE	`/v1/memory/entries/:id`	Delete an entry

Query

Method	Endpoint	Description
POST	`/v1/memory/query`	Keyword search across entries

Init

Method	Endpoint	Description
POST	`/v1/memory/init`	Bootstrap memory from context docs

SDK usage

TypeScript

import BrainstormRouter from "brainstormrouter";

const client = new BrainstormRouter({ apiKey: "br_live_..." });

// List blocks
const { blocks } = await client.memory.blocks();

// List all entries
const { entries } = await client.memory.entries();

// Append a fact
await client.memory.append("Deploy target is us-east-1 on ECS Fargate", {
  block: "system",
});

// Update a fact
await client.memory.update("entry-id", {
  fact: "Deploy target is us-west-2 on ECS Fargate",
});

// Pin an entry
await client.memory.update("entry-id", { pinned: true });

// Search by keyword
const { results } = await client.memory.query("deploy");
// results: [{ entry: { id, fact, block, pinned }, score: 0.85 }]

// Delete
await client.memory.remove("entry-id");

Python

from brainstormrouter import BrainstormRouter

client = BrainstormRouter(api_key="br_live_...")

# List blocks
blocks = client.memory.blocks()

# List all entries
entries = client.memory.entries()

# Append a fact
client.memory.append(fact="Deploy target is ECS Fargate", block="system")

# Update a fact
client.memory.update("entry-id", fact="Deploy target is us-west-2")

# Pin an entry
client.memory.update("entry-id", pinned=True)

# Search by keyword
results = client.memory.query("deploy")

# Delete
client.memory.remove("entry-id")

MCP tools

The MCP server exposes 5 memory tools for AI agents connecting via the Model Context Protocol:

Tool	Permission	Description
`br_memory_list`	`memory.read`	List all core memory entries
`br_memory_store`	`memory.write`	Store a new memory entry
`br_memory_query`	`memory.read`	Search entries by keyword relevance
`br_memory_delete`	`memory.write`	Delete an entry by ID
`br_memory_update`	`memory.write`	Update entry text and/or pin status

Asynchronous extraction

Memory extraction doesn't happen inline — it would add too much latency. After a completion returns, the gateway enqueues an extraction job using a Postgres-backed durable job queue with FOR UPDATE SKIP LOCKED for concurrent worker processing. The response is returned immediately; background workers claim and process jobs asynchronously.

Storage and isolation

Core memory is stored in per-tenant SQLite databases at /tenants//memory/rmm.db. Archival memory uses Postgres with pgvector for vector indexing.

Tenant isolation is enforced at the filesystem level (separate SQLite files per tenant) for core memory, and by Row-Level Security policies for archival memory in Postgres. There is no cross-tenant data access.

What's shipped vs. planned

Shipped:

Core memory CRUD (append, replace, delete, pin/unpin) via API, SDKs, MCP, and dashboard
Keyword search/query via API, SDKs, MCP, and dashboard
7 agent tools for autonomous memory management
Memory init (bootstrap from context documents)
Per-tenant SQLite isolation
Session isolation flag (BR_MEMORY_SESSION_ISOLATION)
Async extraction queue

Planned:

Archival memory search as a REST endpoint (currently agent-only via tool)
Cross-system unification (core RMM + Postgres chunks into a single query surface)
Memory provenance and mutation audit trail