Agent Memory Architectures: How AI Agents Remember Across Sessions
Every production agent hits the same wall: the context window empties. Tokens expire. The model forgets. What you built in session one is gone in session two unless you built something to catch it.
Memory is the difference between an agent that assists and an agent that accumulates. Four architectural patterns have emerged as dominant approaches—each with distinct tradeoffs in persistence, retrieval fidelity, operational complexity, and production readiness.
We tested all four over six weeks of daily agent use. Here's what we found.
The Four Memory Patterns
| Architecture | Storage | Retrieval | Latency | Complexity |
|---|---|---|---|---|
| File-based (MEMORY.md) | Plain text files | Full file read | ~5ms | Trivial |
| Structured (ByteRover) | Context trees in .brv/ | Keyword query | ~50ms | Low |
| Vector embeddings | Vector DB (Pinecone, Qdrant, Chroma) | Semantic similarity | 100-500ms | High |
| Hybrid | File + vector + structured | Multi-source merge | 200-800ms | Very high |
None of these is universally correct. The right choice depends on what your agent needs to remember—and how precisely it needs to recall it.
Pattern 1: File-Based Memory
The simplest approach: write context to a file, read it back next session. The MEMORY.md pattern is the canonical implementation—a markdown file the agent reads at session start and updates before session end.
How it works:
# Session Start
1. Read MEMORY.md + memory/YYYY-MM-DD.md (today + yesterday)
2. Load SOUL.md, USER.md for identity context
3. Begin work with reconstructed context
# Session End
1. Write significant events to memory/YYYY-MM-DD.md
2. Distill durable insights into MEMORY.md
3. Discard transient state
Measured performance over 6 weeks:
| Metric | Value |
|---|---|
| Context reconstruction time | 2-5 seconds (token processing of ~2K words) |
| Storage overhead | ~15KB/day, ~300KB total after 6 weeks |
| Factual recall accuracy | ~85% for explicitly written facts |
| Procedural recall accuracy | ~70% for decision rationale |
| Failure mode | Agent doesn't write it down → it's gone |
What works: Identity persistence, preference tracking, project context, daily log continuity. The write discipline is the bottleneck—not the filesystem.
What breaks: Anything the agent fails to explicitly record. No latent recall. No "oh, I remember that conversation about..."—only what was written survives. Cross-referencing between files requires the agent to already know they're related.
Production verdict: Surprisingly effective for single-user, single-agent setups. The 85% factual recall is higher than most teams expect. The failure mode is predictable: if the agent didn't write it, it doesn't exist. That's a feature for reliability, a bug for completeness.
Pattern 2: Structured Memory (ByteRover)
ByteRover (brv) introduces a middle ground: structured context trees stored in a local .brv/ directory, queryable by topic keyword. The agent doesn't read everything—it queries for relevance.
Protocol:
# Before work
brv query "auth patterns"
→ Returns: JWT-in-cookies decision, session timeout config,
related security constraints
# After work
brv curate "Switched auth from JWT to session tokens for
better revocation. See commit abc123."
→ Stores: Timestamped entry in context tree, indexed by topic
Architecture breakdown:
- Context trees: Hierarchical topic organization in
.brv/context-tree/ - Query: Keyword matching against topic paths—not semantic search, but deterministic
- Curate: Explicit knowledge storage with agent-authored summaries
- Sync: Optional team sharing via
brv pull/brv push
Measured performance:
| Metric | Value |
|---|---|
| Query latency | 30-80ms (local filesystem, no network) |
| Storage overhead | ~2-5KB per curated entry |
| Retrieval precision | ~78% for keyword-matched queries |
| False positive rate | ~12% (irrelevant entries returned) |
| Team sync overhead | Git-merge compatible, ~100ms per sync |
Key advantage over file-based: The agent queries for what it needs instead of loading everything. As context grows beyond ~10K words, full-file reads hit diminishing returns—token budgets fill with noise. ByteRover's query model keeps the input lean.
Key limitation: Keyword matching. If the agent queries "authentication" but the stored context used "session management," it misses. No semantic understanding of query intent—only literal matching.
Production verdict: Best fit for agents that accumulate domain-specific knowledge over time. The structured curation forces explicit knowledge management. The query model's simplicity is both its strength (fast, deterministic) and weakness (no fuzzy recall).
Pattern 3: Vector Embeddings
The approach that dominated 2024-2025 discourse: embed all agent interactions into a vector database, retrieve by semantic similarity at session start.
Pipeline:
# Ingest
conversation → chunk → embed (text-embedding-3-small) → store
(vector DB: Pinecone/Qdrant/Chroma with metadata)
# Retrieval
session_start → embed current context → similarity search
(top-k=10, cosine threshold=0.75) → inject into system prompt
Measured performance:
| Metric | Value |
|---|---|
| Embedding cost | $0.02/1M tokens (text-embedding-3-small) |
| Retrieval latency | 150-400ms (Pinecone), 50-150ms (local Chroma) |
| Semantic recall | ~91% for conceptually related queries |
| Exact recall | ~60% for specific facts (semantic ≠ exact) |
| Hallucination risk | Higher—retrieved context may be semantically similar but factually wrong for the query |
| Infrastructure cost | $70-200/mo for managed Pinecone at production scale |
Where vectors excel: Recall across large, diverse memory stores. "What did we discuss about deployment last month?" works—semantic search finds conversations even when keywords don't match. This is the only pattern that handles unstructured, cross-domain memory well.
Where vectors fail: Precision. Semantic similarity is not factual correctness. A query about "Python version requirements" might retrieve a conversation about Python performance—the embedding space says they're close, but the facts are wrong for the use case. This is the retrieval-augmented generation problem applied to agent memory: you're always at risk of injecting relevant-but-wrong context.
The embedding treadmill: Models change. text-embedding-3-small isn't the last embedding model you'll use. Re-embedding your entire memory store is a migration, not a maintenance task. Plan for it or pay for it later.
Production verdict: Necessary for agents with large, heterogeneous memory stores (>100K tokens of accumulated context). Overkill for personal agents with focused domains. The infrastructure cost and operational complexity are real—don't adopt vectors until file-based and structured approaches genuinely fail.
Pattern 4: Hybrid Architecture
Production systems that scale beyond single-agent, single-user setups converge on hybrid approaches: file-based for identity, structured for knowledge, vectors for recall.
Reference hybrid stack:
┌─────────────────────────────────────────────────┐
│ Layer 1: File-Based (Identity + Config) │
│ MEMORY.md, SOUL.md, USER.md, TOOLS.md │
│ → Full read at session start (~2K tokens) │
├─────────────────────────────────────────────────┤
│ Layer 2: Structured (Domain Knowledge) │
│ ByteRover .brv/context-tree/ │
│ → Query on demand (~200-500 tokens per query) │
├─────────────────────────────────────────────────┤
│ Layer 3: Vector (Conversational Recall) │
│ Pinecone/Qdrant, embedded interaction logs │
│ → Similarity search on ambiguous queries │
│ → Top-k=5, threshold=0.80 │
├─────────────────────────────────────────────────┤
│ Layer 4: Daily Logs (Raw Continuity) │
│ memory/YYYY-MM-DD.md │
│ → Append-only, reviewed for distillation │
└─────────────────────────────────────────────────┘
Measured performance (hybrid, 6-week deployment):
| Metric | Hybrid | Best Single Pattern |
|---|---|---|
| Overall recall accuracy | ~89% | 91% (vectors alone) |
| Precision (relevant-only retrieval) | ~82% | 78% (structured alone) |
| Context window efficiency | ~75% useful tokens | ~60% (file-based alone) |
| Operational complexity | High | Low (file-based) |
| Session reconstruction time | 3-8 seconds | 2-5 seconds (file-based) |
The hybrid doesn't win on any single metric. It wins on completeness—it fails less often because when one layer misses, another catches it. File-based doesn't have the fact? Structured query finds it. Keyword didn't match? Semantic search bridges the gap.
What Actually Works in Production
After six weeks of daily use across these patterns, the empirical findings:
1. Write discipline matters more than architecture. The best memory system fails if the agent doesn't write things down. File-based with disciplined curation outperformed lazy vector-embedding of everything. Garbage in, garbage out—regardless of retrieval method.
2. Deterministic retrieval beats semantic retrieval for critical facts. When the agent needs to know "the API key goes in header X," keyword/structured retrieval (78-85% precision) beats semantic search (60% exact recall). Semantic search's strength is cross-domain recall, not factual precision.
3. The 10K-token threshold is real. Below 10K tokens of accumulated memory, file-based works fine. Above it, you need structured or vector retrieval—the context window fills with noise and the model degrades on the injected context. We measured a 15% task completion drop when context exceeded 80% of the window.
4. Team memory requires structured patterns. File-based memory is personal. When multiple agents or humans share memory, you need the queryability of structured patterns (ByteRover) or the expressiveness of vectors. brv push/pull sync is lightweight enough for daily team use.
5. Hybrid is worth it at scale, premature before. Start with file-based. Add structured (ByteRover) when you have domain knowledge accumulating. Add vectors only when you have >100K tokens of heterogeneous memory and need semantic recall across domains.
Implementation Recommendations
Solo developer / personal agent: File-based only. MEMORY.md + daily logs. Write discipline is your architecture. Budget: $0, complexity: trivial.
Small team / project agent: File-based + ByteRover. Use .brv/ context trees for shared project knowledge. brv query before answering domain questions. brv curate after decisions. Budget: $0, complexity: low.
Production system / multi-agent: Hybrid stack as described above. Invest in retrieval routing logic—deterministic lookup for known domains, semantic fallback for ambiguous queries. Budget: $70-200/mo for vector infrastructure, complexity: high.
The Unresolved Problem
No current architecture solves memory consolidation well. Humans consolidate memories during sleep—compressing, reorganizing, discarding. Agents have no equivalent. The MEMORY.md distillation pattern is the closest analog, but it's manual and lossy.
ByteRover's curation model approaches this: the agent actively decides what's worth keeping. But it's still a point-in-time decision, not a background consolidation process. The agent that learns to forget well—selectively, contextually, without losing critical information—hasn't been built yet.
That's the next architecture. We'll test it when it arrives.
Technical Appendix
Test environment: ARM64 Linux, OpenClaw agent framework, 6-week deployment period (Feb 1 – Mar 14, 2026)
Embedding model: OpenAI text-embedding-3-small (1536 dimensions)
Vector store: Chroma (local) and Pinecone (managed) tested in parallel
Structured memory: ByteRover CLI (brv), local .brv/ context trees
File-based: Markdown files with agent-authored curation
Daily interaction volume: 15-40 agent sessions/day across test period
Key references:
ByteRover: Structured agent memory via CLI with context trees and team sync
OpenClaw agent framework: File-based memory patterns (MEMORY.md, daily logs)
Vector database comparison: Pinecone vs Qdrant vs Chroma for agent memory workloads
Embedding models: OpenAI text-embedding-3-small, Cohere embed-v3