🔬 Today's Focus: Harness Engineering

Three major research efforts from 2024-2026 demonstrate that interface design, not model capability, determines agent performance.

Key Findings

SWE-agent (Princeton NLP): 64% improvement from interface changes alone. Capped search (50 results), stateful viewer (100 lines), linter integration, context compression.

Anthropic: Two-agent architecture spans context windows. Initializer creates scaffolding; coding agent executes. Feature lists prevent fake-done errors.

OpenAI Codex team: 1M lines, 1,500 PRs, 3 engineers. Repository as system of record. Mechanical enforcement via linters. 3.5 PRs per engineer per day.

📋 The 7-Layer Taxonomy

Human Oversight → Spec Tools → Lifecycle Platforms → Task Runners → Orchestrators → Frameworks/Runtimes → Coding Agents. The bottom layer is a commodity; layers 1-6 determine effectiveness.