Relational Memory Manager: 4-Block Structured Context with Relevance Scoring

2026-02-19

relational-memorymemory-api

LOCKSTEP TRACEABILITY MATRIX --- api_endpoints: [ "GET /v1/memory/blocks", "GET /v1/memory/blocks/:block", "GET /v1/memory/entries", "POST /v1/memory/entries", "PUT /v1/memory/entries/:id", "DELETE /v1/memory/entries/:id", "POST /v1/memory/query", ] sdk_methods_updated: [ "client.memory.blocks()", "client.memory.entries()", "client.memory.create()", "client.memory.update()", "client.memory.delete()", "client.memory.query()", ] mcp_tools_updated: ["none"] ---

What We Built

The Relational Memory Manager (RMM) gives every agent and tenant a structured, persistent memory divided into four semantic blocks: human (user preferences, identity, communication style), project (project name, tech stack, architecture), system (infrastructure, deployment, environment), and general (everything else). Memory entries can be pinned for permanent retention, queried with relevance scoring, and bootstrapped from context documents.

The Memory API (memory.ts, 484 LOC) exposes 7 REST endpoints for full CRUD on memory entries, block inspection, relevance-scored search, and bulk initialization. It supports session isolation for multi-agent scenarios — when an agent authenticates via JWT with a sid claim, memory operations are scoped to that session.

The query endpoint (POST /v1/memory/query) scores all memory entries against a natural language query and returns the top 10 by relevance, enabling agents to pull in the most contextually relevant memories when constructing their prompts. This is how BrainstormRouter agents maintain context across conversations without stuffing the entire memory into every request.

Why It Matters

LLMs are stateless. Every request starts from scratch unless the application manages context. Most developers solve this with a growing messages array that eventually exceeds the context window and gets truncated — losing critical information.

BrainstormRouter's RMM gives developers a structured alternative: store facts in semantic blocks, query for relevant memories before each request, and page context in and out of the messages array. This is not RAG (retrieval-augmented generation) over documents — it is structured agent memory that persists across sessions and respects privacy modes.

How It Works

Memory Blocks:

human   — "User prefers concise responses", "Name: Justin"
project — "Stack: TypeScript + Hono", "Deploys to AWS ECS Fargate"
system  — "API endpoint: api.brainstormrouter.com", "DB: Postgres 16"
general — "Last discussed: pricing optimization", "Favorite model: Claude Sonnet"

CRUD Operations:

// Create entry
POST /v1/memory/entries
{ "fact": "User prefers TypeScript", "block": "human", "pinned": false }

// Query with relevance scoring
POST /v1/memory/query
{ "query": "What tech stack does the user prefer?" }
// Returns top 10 entries sorted by relevance score

Memory Bootstrap (Admin-only):

{
  "documents": [
    { "content": "Company uses React + TypeScript...", "source": "tech-stack.md" },
    { "content": "Deploy to AWS us-east-1...", "source": "infra.md" }
  ]
}

The init endpoint sends all documents to an embedded agent that extracts structured facts and stores them via core_memory_append tools. Returns entry count, usage stats, and duration.

Session Isolation:

Agent-JWT callers with sid claim: Memory scoped to that session
API key callers: No session scoping (full tenant memory access)
Memory init: Admin-only, blocks agent-JWT callers

Privacy:

Memory respects Guardian's privacy modes. In Standard mode, embeddings and content are stored for query scoring. In Strict mode, only metadata is stored — no content, no embeddings.

The Numbers

4 semantic memory blocks — human, project, system, general
7 REST endpoints for full CRUD + query + bootstrap
Top-10 relevance scoring on query (entries with score > 0 only)
Session isolation via JWT sid claim for multi-agent scenarios
2 privacy modes — Standard (full content) and Strict (metadata only)
Backward-compatible dual response keys — score/relevance, context/fact

Competitive Edge

OpenRouter has no memory system. Portkey has no memory system. Letta has memory but with no structured blocks, no relevance scoring, and no privacy modes. BrainstormRouter's RMM is the only AI gateway memory that combines semantic block structure, per-session isolation, relevance-scored retrieval, document bootstrapping, and privacy-mode awareness. Developers using BrainstormRouter get persistent, queryable agent memory as a platform feature — not a library they have to build and host themselves.

Lockstep Checklist

[x] API Routes: Full CRUD at /v1/memory/* including blocks, entries, query, and init.
[x] TS SDK: client.memory.blocks(), .entries(), .create(), .update(), .delete(), .query() in packages/sdk-ts/src/resources/memory.ts.
[x] Python SDK: Matching implementation in packages/sdk-py/src/brainstormrouter/resources/memory.py.
[ ] MCP Schemas: Not applicable.
[x] Master Record: Listed under "Memory & Context" in master-capability-record.md.