# Context Engineering

**Purpose:** Optimize agent performance through strategic context management, filesystem overflow patterns, and multi-agent architecture decisions.

**Source:** Synthesized from BrowseComp research, production agent telemetry, and context window behavior analysis.

---

## When to Activate

Use this skill when:
- Context window feels "full" or performance degrades mid-session
- Tool outputs are bloating the conversation (web searches, terminal output, large reads)
- Planning multi-agent coordination (subagents, parallel tasks)
- Long-running tasks need state persistence across context boundaries
- Summarizing session work for memory files

---

## The Four Context Failure Modes

Diagnose problems against these modes — each needs a different fix:

| Failure Mode | Symptom | Fix |
|--------------|---------|-----|
| **Missing** | Information was never captured | Persist tool outputs to files immediately |
| **Under-retrieved** | Have the file, didn't load right content | Use grep-friendly formats, clear headers, line-range reads |
| **Over-retrieved** | Loaded too much, wasting tokens | Offload bulk content, return compact references |
| **Buried** | Information scattered across many files | Combine glob + grep for structural search |

---

## Pattern 1: Filesystem Scratch Pad (Critical)

**Problem:** Single web search or terminal command can dump 3000+ tokens into context where they persist forever.

**Solution:** Large outputs → file → compact summary → reference

```
Tool Output (3000 tokens)
    ↓
Write to scratch/ directory
    ↓
Extract 200-token summary
    ↓
Context gets: "[Output in scratch/web_20240323.txt. Key findings: X, Y, Z]"
    ↓
Need details? grep or read with line ranges
```

### Threshold Guidelines

| Output Type | Auto-file Threshold | Storage |
|-------------|---------------------|---------|
| Web search | >2000 chars | `scratch/web_[topic]_[date].md` |
| Terminal output | >1500 chars | `scratch/term_[cmd]_[date].txt` |
| Large file read | Always file | Original location or `scratch/` |
| Multi-file operations | Always summarize | `memory/YYYY-MM-DD.md` |

### Implementation

**Write with structured headers for grepability:**
```markdown
# Web Search: AI Agent Frameworks
## Source: https://...
## Timestamp: 2026-03-23 14:00 UTC

### Key Finding 1: BrowseComp Results
[content]

### Key Finding 2: Token Economics
[content]
```

**Retrieve selectively:**
```bash
# Find specific section
grep -A 5 "Token Economics" scratch/web_*.md

# Read specific lines
read scratch/web_20240323.md offset=45 limit=20
```

---

## Pattern 2: Anchored Iterative Summarization

**Problem:** Session summaries lose critical details (specific files, decisions, identifiers).

**Solution:** Structured sections with explicit artifact tracking.

### Required Sections

```markdown
## Session: [Brief descriptor]
**Time:** YYYY-MM-DD HH:MM UTC
**Intent:** [One-line goal]

### Artifacts Created
- `labnotes/skills/ui-design-systems/SKILL.md` — Design system capture protocol
- `scratch/web_agent_research_20240323.md` — BrowseComp findings

### Artifacts Modified
- `memory/2026-03-23.md` — Appended session review (lines 45-89)
- `HEARTBEAT.md` — Added landing page sync check

### Decisions Made
- DECISION: Use filesystem scratch pad for outputs >2000 chars
- DECISION: Token budget awareness for multi-agent (15x baseline cost)

### Identifiers Preserved
- Commit: `4c910bf`
- URL: `https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering`
- File refs: `e40`, `e45` (browser snapshot)

### Next Steps
- [ ] Implement scratch pad on next large tool output
- [ ] Review context compression on sessions >50 turns
```

### Scoring Checklist

Rate your summary 1-5 on each dimension:

| Dimension | 1 (Poor) | 5 (Excellent) |
|-----------|----------|---------------|
| **Files** | "worked on files" | "Created `skills/ui-design-systems/SKILL.md`, modified `memory/2026-03-23.md` lines 45-89" |
| **Decisions** | "made choices" | "DECISION: Token budget awareness (15x multi-agent cost)" |
| **Identifiers** | "some commit" | "Commit `4c910bf`, URL `github.com/...`" |
| **Next Steps** | "continue work" | "[ ] Run ByteRover curate on patterns" |
| **Intent** | "did stuff" | "Created context engineering skill from research" |

**Target:** 4.0+ average score. Below 3.0 = rewrite with more specifics.

---

## Pattern 3: Multi-Agent Architecture Decisions

**Critical Reality:** Multi-agent systems cost ~15x tokens vs single-agent chat.

| Architecture | Token Multiplier | When to Use |
|--------------|------------------|-------------|
| Single agent | 1x baseline | Simple queries, <20 turns |
| Single + tools | ~4x baseline | Tool-using tasks, moderate complexity |
| Multi-agent | ~15x baseline | Parallelizable subtasks, context isolation critical |

**Key Insight:** Token usage explains 80% of performance variance (BrowseComp research). Model quality upgrades often outperform raw token increases.

### Three Patterns

**1. Supervisor/Orchestrator**
- One coordinator delegates to specialists
- Use when: Clear decomposition, human oversight matters, results need synthesis
- Pattern: `sessions_spawn` with `mode="run"` for one-shot tasks

**2. Peer-to-Peer / Swarm**
- Any agent can handoff to any other
- Use when: Flexible exploration, rigid planning counterproductive
- Pattern: Agents write shared state to files, pick up where others left off

**3. Hierarchical**
- Strategy → Planning → Execution layers
- Use when: Large-scale projects with layered abstraction
- Pattern: Each layer operates at different detail level with own context

### When NOT to Use Multi-Agent

- Task fits in single context window (<100k tokens total)
- Sequential processing sufficient (no parallelization gains)
- Coordination overhead exceeds computation gains
- Budget constraints (15x cost is significant)

---

## Pattern 4: Context Compression Strategies

**Problem:** Aggressive compression → re-fetching → costs more overall.

**Optimize for:** tokens-per-task, not tokens-per-request

### Tiered Compression

```
Tier 1: Full content (first encounter)
  └─ Keep complete for immediate use

Tier 2: Anchored summary (after first use)
  └─ Key findings + file reference + line ranges
  └─ "[Full content in scratch/file.md, lines 45-89: finding X]"

Tier 3: Reference only (long-term storage)
  └─ File path + one-line descriptor
  └─ "Research on X: scratch/file.md"

Tier 4: Curated knowledge (ByteRover)
  └─ Distilled patterns: brv curate "[pattern]"
```

### Compression Checklist

Before compressing, ensure:

- [ ] Critical identifiers preserved (commit hashes, specific URLs, UUIDs)
- [ ] File paths documented with line numbers if relevant
- [ ] Decisions explicitly stated (not implied)
- [ ] Next steps actionable (not vague "continue")
- [ ] Source attribution maintained (don't lose where info came from)

---

## Token Economics Guidelines

### Budget Awareness

| Operation | Approximate Cost | Budget Impact |
|-----------|------------------|---------------|
| Web search (10 results) | 2,000-4,000 tokens | Medium |
| Large file read (>500 lines) | 3,000-8,000 tokens | High — file it |
| Subagent spawn (simple task) | 5,000-15,000 tokens | High |
| Subagent spawn (complex) | 20,000-50,000 tokens | Very High |
| Image analysis | 1,000-3,000 tokens | Low-Medium |
| Full context window | 100,000-200,000 tokens | Session limit |

### Cost-Reduction Strategies

1. **Batch similar operations** — One web search with 10 queries vs 10 separate searches
2. **Use line ranges** — `read file.md offset=100 limit=50` vs full file
3. **Prefer grep over read** — Search before loading entire files
4. **File large outputs immediately** — Don't let them sit in context
5. **Model selection matters** — Better model > more tokens (BrowseComp finding)

---

## Dynamic vs Static Context Trade-off

### Static Context (Loaded Every Turn)

**Include statically:**
- Safety/correctness constraints (NEVER rules)
- Critical tool definitions
- Session identity (SOUL.md, USER.md)

**Cost:** Consumes tokens on every turn regardless of relevance.

### Dynamic Context (Loaded On-Demand)

**Load dynamically:**
- Project-specific patterns (query ByteRover)
- Historical session details (query memory_search)
- Large documentation (read when needed)

**Benefit:** Token-efficient, reduces contradictory information.
**Risk:** Model must recognize when to load. Frontier models handle this well.

### Decision Rule

When in doubt:
- Safety rules → Static (never skip)
- How-to guides → Dynamic (query when needed)
- Project history → Dynamic (search selectively)

---

## Integration with Existing Skills

| This Pattern | Integrates With | Usage |
|--------------|-----------------|-------|
| Scratch pad | `web_search`, `exec`, `read` | Auto-file outputs >threshold |
| Anchored summary | `memory/YYYY-MM-DD.md` | Structured session logs |
| Multi-agent | `sessions_spawn` | Spawn with context isolation |
| Token economics | Subagent decisions | 15x cost awareness |
| ByteRover curate | `brv` command | Long-term pattern storage |

---

## Quick Reference

### Diagnostic Questions

Context feels degraded? Ask:
1. **Missing?** Did I file that tool output?
2. **Under-retrieved?** Did I grep for the right section?
3. **Over-retrieved?** Did I load a huge file unnecessarily?
4. **Buried?** Is the info scattered across too many files?

### Action Triggers

| Situation | Action |
|-----------|--------|
| Web search returns >2000 chars | Auto-file to scratch/ |
| Session >30 turns | Write anchored summary to memory |
| Planning subagents | Calculate 15x cost, verify necessity |
| Large output in context | File it, replace with reference |
| End of complex task | Score summary (target 4.0+), curate patterns |

### Commands

```bash
# Quick scratch file
echo "[content]" > scratch/[type]_[desc]_$(date +%Y%m%d).md

# Search scratch
grep -r "pattern" scratch/

# Curate pattern
brv curate "[Concise pattern description with context]"

# Memory search (dynamic context)
memory_search "[query]"
```

---

## Metrics to Track

- **Context efficiency:** Tokens per completed task (lower is better)
- **Retrieval accuracy:** Correct info found on first try (higher is better)
- **Re-fetch rate:** How often you re-load filed content (lower is better)
- **Summary quality:** Self-rated anchored summary scores (target 4.0+)

---

**Maintained by:** PromptEngines Lab  
**Source Research:** BrowseComp evaluation, muratcankoylan/Agent-Skills-for-Context-Engineering  
**Last Updated:** March 23, 2026