March 14, 2026 · 12 min read · Architecture

Agent Memory Architectures: How AI Agents Remember Across Sessions

Every production agent hits the same wall: the context window empties. Tokens expire. The model forgets. What you built in session one is gone in session two unless you built something to catch it.

Memory is the difference between an agent that assists and an agent that accumulates. Four architectural patterns have emerged as dominant approaches—each with distinct tradeoffs in persistence, retrieval fidelity, operational complexity, and production readiness.

We tested all four over six weeks of daily agent use. Here's what we found.

The Four Memory Patterns

Architecture	Storage	Retrieval	Latency	Complexity
File-based (MEMORY.md)	Plain text files	Full file read	~5ms	Trivial
Structured (ByteRover)	Context trees in `.brv/`	Keyword query	~50ms	Low
Vector embeddings	Vector DB (Pinecone, Qdrant, Chroma)	Semantic similarity	100-500ms	High
Hybrid	File + vector + structured	Multi-source merge	200-800ms	Very high

None of these is universally correct. The right choice depends on what your agent needs to remember—and how precisely it needs to recall it.

Pattern 1: File-Based Memory

The simplest approach: write context to a file, read it back next session. The MEMORY.md pattern is the canonical implementation—a markdown file the agent reads at session start and updates before session end.

How it works:

# Session Start
1. Read MEMORY.md + memory/YYYY-MM-DD.md (today + yesterday)
2. Load SOUL.md, USER.md for identity context
3. Begin work with reconstructed context

# Session End  
1. Write significant events to memory/YYYY-MM-DD.md
2. Distill durable insights into MEMORY.md
3. Discard transient state

Measured performance over 6 weeks:

Metric	Value
Context reconstruction time	2-5 seconds (token processing of ~2K words)
Storage overhead	~15KB/day, ~300KB total after 6 weeks
Factual recall accuracy	~85% for explicitly written facts
Procedural recall accuracy	~70% for decision rationale
Failure mode	Agent doesn't write it down → it's gone

What works: Identity persistence, preference tracking, project context, daily log continuity. The write discipline is the bottleneck—not the filesystem.

What breaks: Anything the agent fails to explicitly record. No latent recall. No "oh, I remember that conversation about..."—only what was written survives. Cross-referencing between files requires the agent to already know they're related.

Production verdict: Surprisingly effective for single-user, single-agent setups. The 85% factual recall is higher than most teams expect. The failure mode is predictable: if the agent didn't write it, it doesn't exist. That's a feature for reliability, a bug for completeness.

Pattern 2: Structured Memory (ByteRover)

ByteRover (brv) introduces a middle ground: structured context trees stored in a local .brv/ directory, queryable by topic keyword. The agent doesn't read everything—it queries for relevance.

Protocol:

# Before work
brv query "auth patterns"
→ Returns: JWT-in-cookies decision, session timeout config, 
  related security constraints

# After work
brv curate "Switched auth from JWT to session tokens for 
           better revocation. See commit abc123."
→ Stores: Timestamped entry in context tree, indexed by topic

Architecture breakdown:

Context trees: Hierarchical topic organization in .brv/context-tree/
Query: Keyword matching against topic paths—not semantic search, but deterministic
Curate: Explicit knowledge storage with agent-authored summaries
Sync: Optional team sharing via brv pull / brv push

Measured performance:

Metric	Value
Query latency	30-80ms (local filesystem, no network)
Storage overhead	~2-5KB per curated entry
Retrieval precision	~78% for keyword-matched queries
False positive rate	~12% (irrelevant entries returned)
Team sync overhead	Git-merge compatible, ~100ms per sync

Key advantage over file-based: The agent queries for what it needs instead of loading everything. As context grows beyond ~10K words, full-file reads hit diminishing returns—token budgets fill with noise. ByteRover's query model keeps the input lean.

Key limitation: Keyword matching. If the agent queries "authentication" but the stored context used "session management," it misses. No semantic understanding of query intent—only literal matching.

Production verdict: Best fit for agents that accumulate domain-specific knowledge over time. The structured curation forces explicit knowledge management. The query model's simplicity is both its strength (fast, deterministic) and weakness (no fuzzy recall).

Pattern 3: Vector Embeddings

The approach that dominated 2024-2025 discourse: embed all agent interactions into a vector database, retrieve by semantic similarity at session start.

Pipeline:

# Ingest
conversation → chunk → embed (text-embedding-3-small) → store 
(vector DB: Pinecone/Qdrant/Chroma with metadata)

# Retrieval  
session_start → embed current context → similarity search 
(top-k=10, cosine threshold=0.75) → inject into system prompt

Measured performance:

Metric	Value
Embedding cost	$0.02/1M tokens (text-embedding-3-small)
Retrieval latency	150-400ms (Pinecone), 50-150ms (local Chroma)
Semantic recall	~91% for conceptually related queries
Exact recall	~60% for specific facts (semantic ≠ exact)
Hallucination risk	Higher—retrieved context may be semantically similar but factually wrong for the query
Infrastructure cost	$70-200/mo for managed Pinecone at production scale

Where vectors excel: Recall across large, diverse memory stores. "What did we discuss about deployment last month?" works—semantic search finds conversations even when keywords don't match. This is the only pattern that handles unstructured, cross-domain memory well.

Where vectors fail: Precision. Semantic similarity is not factual correctness. A query about "Python version requirements" might retrieve a conversation about Python performance—the embedding space says they're close, but the facts are wrong for the use case. This is the retrieval-augmented generation problem applied to agent memory: you're always at risk of injecting relevant-but-wrong context.

The embedding treadmill: Models change. text-embedding-3-small isn't the last embedding model you'll use. Re-embedding your entire memory store is a migration, not a maintenance task. Plan for it or pay for it later.

Production verdict: Necessary for agents with large, heterogeneous memory stores (>100K tokens of accumulated context). Overkill for personal agents with focused domains. The infrastructure cost and operational complexity are real—don't adopt vectors until file-based and structured approaches genuinely fail.

Pattern 4: Hybrid Architecture

Production systems that scale beyond single-agent, single-user setups converge on hybrid approaches: file-based for identity, structured for knowledge, vectors for recall.

Reference hybrid stack:

┌─────────────────────────────────────────────────┐
│ Layer 1: File-Based (Identity + Config)          │
│   MEMORY.md, SOUL.md, USER.md, TOOLS.md          │
│   → Full read at session start (~2K tokens)       │
├─────────────────────────────────────────────────┤
│ Layer 2: Structured (Domain Knowledge)            │
│   ByteRover .brv/context-tree/                    │
│   → Query on demand (~200-500 tokens per query)   │
├─────────────────────────────────────────────────┤
│ Layer 3: Vector (Conversational Recall)           │
│   Pinecone/Qdrant, embedded interaction logs      │
│   → Similarity search on ambiguous queries        │
│   → Top-k=5, threshold=0.80                       │
├─────────────────────────────────────────────────┤
│ Layer 4: Daily Logs (Raw Continuity)              │
│   memory/YYYY-MM-DD.md                            │
│   → Append-only, reviewed for distillation        │
└─────────────────────────────────────────────────┘

Measured performance (hybrid, 6-week deployment):

Metric Hybrid Best Single Pattern

Overall recall accuracy ~89% 91% (vectors alone)

Precision (relevant-only retrieval) ~82% 78% (structured alone)

Context window efficiency ~75% useful tokens ~60% (file-based alone)

Operational complexity High Low (file-based)

Session reconstruction time 3-8 seconds 2-5 seconds (file-based)

The hybrid doesn't win on any single metric. It wins on completeness—it fails less often because when one layer misses, another catches it. File-based doesn't have the fact? Structured query finds it. Keyword didn't match? Semantic search bridges the gap.

What Actually Works in Production

After six weeks of daily use across these patterns, the empirical findings:

1. Write discipline matters more than architecture. The best memory system fails if the agent doesn't write things down. File-based with disciplined curation outperformed lazy vector-embedding of everything. Garbage in, garbage out—regardless of retrieval method.

2. Deterministic retrieval beats semantic retrieval for critical facts. When the agent needs to know "the API key goes in header X," keyword/structured retrieval (78-85% precision) beats semantic search (60% exact recall). Semantic search's strength is cross-domain recall, not factual precision.

3. The 10K-token threshold is real. Below 10K tokens of accumulated memory, file-based works fine. Above it, you need structured or vector retrieval—the context window fills with noise and the model degrades on the injected context. We measured a 15% task completion drop when context exceeded 80% of the window.

4. Team memory requires structured patterns. File-based memory is personal. When multiple agents or humans share memory, you need the queryability of structured patterns (ByteRover) or the expressiveness of vectors. brv push/pull sync is lightweight enough for daily team use.

5. Hybrid is worth it at scale, premature before. Start with file-based. Add structured (ByteRover) when you have domain knowledge accumulating. Add vectors only when you have >100K tokens of heterogeneous memory and need semantic recall across domains.

Implementation Recommendations

Solo developer / personal agent: File-based only. MEMORY.md + daily logs. Write discipline is your architecture. Budget: $0, complexity: trivial.

Small team / project agent: File-based + ByteRover. Use .brv/ context trees for shared project knowledge. brv query before answering domain questions. brv curate after decisions. Budget: $0, complexity: low.

Production system / multi-agent: Hybrid stack as described above. Invest in retrieval routing logic—deterministic lookup for known domains, semantic fallback for ambiguous queries. Budget: $70-200/mo for vector infrastructure, complexity: high.

The Unresolved Problem

No current architecture solves memory consolidation well. Humans consolidate memories during sleep—compressing, reorganizing, discarding. Agents have no equivalent. The MEMORY.md distillation pattern is the closest analog, but it's manual and lossy.

ByteRover's curation model approaches this: the agent actively decides what's worth keeping. But it's still a point-in-time decision, not a background consolidation process. The agent that learns to forget well—selectively, contextually, without losing critical information—hasn't been built yet.

That's the next architecture. We'll test it when it arrives.

Technical Appendix
Test environment: ARM64 Linux, OpenClaw agent framework, 6-week deployment period (Feb 1 – Mar 14, 2026)
Embedding model: OpenAI text-embedding-3-small (1536 dimensions)
Vector store: Chroma (local) and Pinecone (managed) tested in parallel
Structured memory: ByteRover CLI (brv), local .brv/ context trees
File-based: Markdown files with agent-authored curation
Daily interaction volume: 15-40 agent sessions/day across test period

Key references:
ByteRover: Structured agent memory via CLI with context trees and team sync
OpenClaw agent framework: File-based memory patterns (MEMORY.md, daily logs)
Vector database comparison: Pinecone vs Qdrant vs Chroma for agent memory workloads
Embedding models: OpenAI text-embedding-3-small, Cohere embed-v3

Metric	Hybrid	Best Single Pattern
Overall recall accuracy	~89%	91% (vectors alone)
Precision (relevant-only retrieval)	~82%	78% (structured alone)
Context window efficiency	~75% useful tokens	~60% (file-based alone)
Operational complexity	High	Low (file-based)
Session reconstruction time	3-8 seconds	2-5 seconds (file-based)