Quick Reference March 18, 2026

Harness Engineering: The Fast Version

Three findings, seven layers, five patterns. No fluff.

🔬 Three Research Findings

1. SWE-Agent: 64% from Interface Design

Same model, same task, different interface = 64% performance improvement.

The winning interface:

Capped search (max 50 results) → forces specificity
Stateful viewer (100 lines, numbered) → removes cognitive load
Linter at edit time → catches errors immediately
Context compression → old turns summarized

Result: 3.97% → 12.47% issue resolution. Same GPT-4.

2. Anthropic: Two-Agent Architecture

Problem: Most projects exceed any context window.

Solution:

Initializer: Creates init.sh, feature_list.json, progress.txt, git commit
Coding: One feature at a time, clean handoff, documented state

Key: Feature list is ground truth. 200+ specific, end-to-end, all marked failing. Agents cannot fake completion.

3. OpenAI: 1M Lines, Zero Manual Code

Aug 2025 - Feb 2026. Three engineers. 1,500 PRs. 3.5 PRs/engineer/day.

Architecture:

Repository = system of record (no Slack, no Docs)
Progressive disclosure (short entry → deep docs)
Mechanical enforcement (linters, not human review)
Application legibility (browser automation, observability)

🥞 Seven Layers (Top to Bottom)

1. Human Oversight	Approval, review, steering
2. Spec Tools	Structured requirements, task DAGs
3. Lifecycle Platforms	End-to-end management
4. Task Runners	Issue tracker → agent → PR
5. Orchestrators	Git worktree isolation
6. Frameworks/Runtimes	Primitives vs. persistent infra
7. Coding Agents	Claude Code, Codex (commodity)

Layers 1-6 determine layer 7's effectiveness.

🔁 Five Patterns

Progressive Disclosure	Minimal entry + pointers. Context is finite.
Worktree Isolation	One agent, one branch. Parallel without collision.
Spec First	Machine-readable requirements in repo. Feature lists prevent fake-done.
Mechanical Enforcement	Linters, not code review. Invariants, not implementations.
Tight Feedback	Close gap between action and consequence. Catch early.

⚡ Minimal Effective Harness

Don't need OpenAI's observability stack. Start with:

Progress file: Read at start, write at end. Prevents "declare victory early."
Feature list: Specific, verifiable, end-to-end. All marked failing initially.
Descriptive commits: Every session ends with git commit + progress update.
Browser automation: If web app — agents must see what users see.

🩺 Diagnostic Questions

When systems underperform:

What information does the agent need that it cannot access?
What feedback loop is missing that would catch mistakes?
Where is context getting polluted?
What constraints need mechanical enforcement?

Each answer → specific harness improvement.

💡 Core Finding

The harness is the durable advantage. Model capability is a commodity. Organizations investing in environment design — scaffolding, feedback loops, spec tooling, orchestration — outperform those focused on model selection.

Published March 18, 2026 — Prompt Engines Lab