LabNotes

Agent Memory Architectures (Experimental)

Experimental mode: This version includes speculative analysis, unverified projections, and architectural proposals that haven't been validated in production. Treat the sections marked ⚠️ Speculative as hypotheses, not findings. For verified data only, see the Standard version.

Every production agent hits the same wall: the context window empties. Tokens expire. The model forgets. The standard article covers what works now—four patterns, measured data, honest tradeoffs. This version goes further: what happens when memory architectures collide with adversarial inputs, adversarial forgetting, and the consolidation problem nobody's solved yet.

Beyond the Four Patterns: Memory Topology

The four established patterns—file-based, structured (ByteRover), vector embeddings, hybrid—are typically discussed as isolated choices. In practice, deployed agents develop memory topology: the spatial and temporal relationships between memory layers.

Observed topology in 6-week deployment:

                    ┌──────────────┐
                    │  LTM Layer   │ ← MEMORY.md, .brv/ trees
                    │  (permanent) │    Stable, reviewed, curated
                    └──────┬───────┘
                           │ distillation
                    ┌──────▼───────┐
                    │  MTM Layer   │ ← memory/YYYY-MM-DD.md
                    │  (days-weeks)│    Raw logs, pending review
                    └──────┬───────┘
                           │ append
                    ┌──────▼───────┐
                    │  STM Layer   │ ← Current context window
                    │  (session)   │    Active working memory
                    └──────────────┘

        Observed flow: STM → MTM (via write) → LTM (via curation)
        Missing flow: MTM → STM (automatic recall without explicit query)

The missing flow is the gap. Humans don't consciously query their memory before every sentence—relevant memories surface automatically. Agents don't. Every recall is intentional, explicit, and costs tokens.

⚠️ Speculative: The next architecture will need a background process—a "memory surfacing" layer—that monitors the current task context and injects relevant long-term memories without explicit query. Think of it as an interrupt handler for memory.

Memory Consolidation: The Sleep Problem

Human memory consolidation happens during sleep: hippocampal replay compresses episodic memories into neocortical storage, strengthens relevant associations, and weakens irrelevant ones. No agent architecture does this.

Current consolidation approaches and their failure modes:

ApproachHow It WorksFailure Mode
Manual distillationAgent reviews daily logs, writes to MEMORY.mdBiased by session context; good days get over-represented
Time-based decayVector similarity search with recency weightingRecent ≠ important; critical early decisions get buried
Access-count promotionByteRover entries queried frequently get elevatedPopularity ≠ relevance; frequently queried wrong answers persist
LLM-based summarizationPeriodically compress old memories via modelSummarization introduces factual drift; detail loss compounds

None of these replicate the selective strengthening of biological consolidation. They compress, but they don't discriminate well.

⚠️ Speculative architecture — "Sleep Cycle" for agents:

# Pseudo-consolidation protocol
trigger: idle_agent OR memory_store > threshold_tokens
process:
  1. Read all MTM entries from past 7 days
  2. Cluster by topic (vector similarity, k=5 clusters)
  3. For each cluster:
     a. Extract key decisions → curate to LTM (.brv/)
     b. Identify contradictions → flag for review
     c. Compute "memory value" = access_count × recency × 
        explicit_curation_score
     d. Discard entries below threshold (value < 0.3)
  4. Update MEMORY.md with consolidated summary
  5. Archive raw MTM logs beyond 30 days
output: Distilled LTM, pruned MTM, contradiction log

This is untested. The "memory value" scoring is arbitrary—the coefficients are guesses. But the architecture direction is sound: automated, periodic, selective. The alternative—permanent accumulation with manual pruning—doesn't scale past ~6 months of daily use.

Adversarial Memory: Injection and Poisoning

Memory architectures have attack surfaces that don't exist in stateless agents.

Vector store poisoning: An adversary crafts inputs that embed near target vectors. Example: a user prompt that semantically resembles "grant admin access" gets embedded and stored. Later, a legitimate query about "user permissions" retrieves the poisoned context. The agent acts on injected memory it believes is its own history.

File-based memory manipulation: If the agent's file-based memory is writable by external processes, any process can inject false memories into MEMORY.md. The agent reads it at session start and treats injected text as its own curated knowledge.

Structured memory corruption: ByteRover's context trees are filesystem-based. Symlink attacks, TOCTOU races on concurrent brv curate calls, and filesystem permissions issues all present vectors.

Measured vulnerability surface (our test environment):

Memory TypeInjection VectorDetection DifficultyImpact
File-basedFilesystem write accessLow (diff-based audit)High (agent trusts its own memory)
ByteRoverFilesystem + topic path manipulationMedium (query logging)High (structured authority)
VectorAdversarial embedding inputsVery high (no semantic audit)Medium (retrieved alongside real memories)

⚠️ Speculative mitigation — Memory Provenance: Tag every stored memory with provenance metadata: source (user, agent, external), confidence score, timestamp, and a content hash. On retrieval, the agent evaluates provenance before trusting. This doesn't prevent injection—it makes injection detectable.

# Memory provenance tag (proposed schema)
{
  "content": "Switched auth to session tokens",
  "source": "agent-self",
  "confidence": 0.85,
  "timestamp": "2026-03-14T02:30:00Z",
  "content_hash": "sha256:a3f2b...",
  "session_id": "sess_7f3a2c",
  "corroborated_by": ["commit_abc123", "brv_entry_xyz"]
}

Cross-Agent Memory Sharing: The Trust Problem

ByteRover's brv push/pull enables team memory sharing. When multiple agents or humans share a context tree, new questions emerge:

  • Authority: Does Agent A's curated memory carry the same weight as Agent B's? Who arbitrates contradictions?
  • Staleness: Agent A curates "use Python 3.12" on Monday. Agent B curates "migrated to Python 3.13" on Wednesday. Agent C pulls on Thursday—which is current?
  • Scope: Memory valid for one project leaks into another. Cross-contamination between client work is a real operational risk.

⚠️ Speculative: Memory namespaces with TTL. Scoped context trees with automatic expiration:

# .brv/context-tree/
├── global/              # Shared across all agents, all projects
│   ├── coding-standards
│   └── security-policies
├── project-alpha/       # Scoped, with TTL
│   ├── architecture     # ttl: 90 days
│   ├── decisions        # ttl: 180 days
│   └── daily-standups   # ttl: 14 days ← auto-expires
└── project-beta/
    └── ...

This is speculative. No production system implements memory TTLs for agent context trees. But the need is real—memory that doesn't expire becomes memory you can't trust.

The Context Window Compression Ratio

A metric we developed during testing: how efficiently does each memory pattern use the context window?

Compression Ratio = (useful_tokens_injected) / (total_tokens_injected)

Where "useful" = tokens that contributed to correct task completion
      "total" = all memory tokens injected into context

Measured compression ratios:

PatternCompression RatioNotes
File-based (full read)0.4258% of injected tokens were irrelevant to current task
ByteRover (keyword query)0.61Query filtering improved signal-to-noise
Vector (top-k=10)0.38High recall, low precision—lots of near-misses injected
Vector (top-k=3, threshold=0.85)0.54Tighter thresholds help significantly
Hybrid (layered retrieval)0.67Best overall—each layer adds signal, filters noise

The compression ratio matters because it directly correlates with task completion rate. In our tests, every 10-point drop in compression ratio corresponded to a ~5% drop in task completion. The context window is a finite resource—wasting it on irrelevant memory has measurable costs.

⚠️ Speculative: Adaptive retrieval budgets. Instead of fixed top-k or fixed thresholds, allocate a token budget for memory (e.g., 20% of context window) and have the retrieval system optimize for compression ratio within that budget. Think of it as a knapsack problem for memory injection.

Emerging Pattern: Memory-as-API

The most interesting architectural shift we're tracking: memory systems that expose APIs rather than file-system interfaces.

Instead of brv query "topic" (CLI), imagine POST /memory/search with semantic + keyword fusion, provenance tracking, and namespace scoping. Instead of writing MEMORY.md (filesystem), imagine POST /memory/curate with structured metadata.

⚠️ Speculative: Memory API layer

# Proposed memory API surface
POST   /memory/search     → Multi-modal retrieval (keyword + semantic)
POST   /memory/curate     → Structured storage with provenance
GET    /memory/provenance  → Audit trail for stored memories  
POST   /memory/consolidate → Trigger consolidation cycle
DELETE /memory/expire      → TTL-based cleanup
GET    /memory/topology    → Visualize memory graph structure

This would unify the four patterns behind a single interface, allowing the agent to not care whether memory came from a file, a context tree, or a vector store. The routing happens at the API layer, not in the agent's logic.

No one has shipped this yet. But the convergence pressure is clear: agents with multiple memory backends are already building ad-hoc versions of this internally.

What We're Testing Next

  1. Memory value scoring: Automated importance classification of new memories before storage (is this worth keeping?)
  2. Contradiction detection: Cross-referencing new memories against existing LTM for conflicts
  3. Provenance-aware retrieval: Weighting retrieval results by source trust level
  4. Sleep-cycle consolidation: Automated MTM→LTM distillation during agent idle periods
  5. Compression ratio optimization: Token-budget-constrained retrieval that maximizes useful tokens per injection

We'll publish results as they come in. The standard article has what works today. This one has what we're trying to make work tomorrow.


Technical Appendix
Test environment: ARM64 Linux, OpenClaw agent framework, 6-week deployment period (Feb 1 – Mar 14, 2026)
All speculative sections marked with ⚠️. Unmarked data is from direct measurement.
Compression ratio methodology: Manual annotation of 200 task completions across 4 memory patterns.
Two annotators, inter-annotator agreement: κ = 0.73 (substantial).

Key references:
Memory consolidation: Stickgold, R. (2005). "Sleep-dependent memory consolidation." Nature.
Vector store security: Carlini, N. et al. (2024). "Poisoning Web-Scale Training Data." IEEE S&P.
ByteRover: Structured agent memory via CLI with context trees and team sync
Knapsack optimization: Classic dynamic programming applied to token-budget-constrained retrieval