Implementation March 18, 2026

Harness Engineering: Implementation Guide

Technical patterns from SWE-agent, Anthropic, and OpenAI. Code-level details.

Implementation 1: The ACI Tool Suite

The SWE-agent paper demonstrated that replacing bash commands with purpose-built tools produced 64% performance improvement. Here's the implementation pattern:

Capped Search Tool

def search_file(pattern: str, path: str = ".") -> str:
    """
    Search for pattern in files. Returns max 50 matches.
    If >50 matches, returns count + suggestion to refine.
    """
    matches = grep(pattern, path)
    
    if len(matches) > 50:
        return (
            f"Found {len(matches)} matches. Too many to display.\n"
            f"Suggestion: Use a more specific pattern or narrow path.\n"
            f"Example: search_file('def {pattern}', 'src/')"
        )
    
    return "\n".join(f"{m.file}:{m.line}: {m.text}" for m in matches[:50])

Key design decision: Hard cap forces refinement. Agents cannot proceed by being vague. Creates natural specificity loop.

Stateful File Viewer

class FileViewer:
    def __init__(self):
        self.positions = {}  # file -> current_line
    
    def view(self, file: str, offset: int = 0) -> str:
        """
        Display 100 lines from file starting at current position + offset.
        Goldilocks number: 30 lines loses context, full file loses focus.
        """
        current = self.positions.get(file, 0)
        start = max(0, current + offset)
        end = start + 100
        
        lines = read_lines(file, start, end)
        numbered = [f"{i:4d}| {line}" for i, line in enumerate(lines, start)]
        
        self.positions[file] = start
        return "\n".join(numbered)

Key design decisions:

100 lines — tested optimal for maintaining context without overwhelming
Explicit line numbers — agents reference directly, no counting cognitive load
Stateful — maintains position across interactions

Linter-Integrated Editor

def edit_file(file: str, start: int, end: int, replacement: str) -> str:
    """
    Replace lines [start, end] with replacement text.
    Runs linter immediately. Returns result or error with context.
    """
    original = read_lines(file, start, end)
    
    # Apply edit
    apply_replacement(file, start, end, replacement)
    
    # Immediate validation
    lint_result = run_linter(file)
    
    if lint_result.errors:
        # Revert and return actionable error
        revert_to(file, original)
        return (
            f"Edit rejected: syntax error introduced\n"
            f"Error: {lint_result.errors[0]}\n"
            f"Original:\n{original}\n"
            f"Your edit:\n{replacement}\n"
            f"Fix the syntax error and try again."
        )
    
    return f"Edit successful. Lines {start}-{end} modified."

Key design decision: Edit + validate as atomic operation. Errors caught at introduction, not three steps later when chasing ghosts.

Implementation 2: Two-Agent Architecture

Anthropic's pattern for spanning context windows. Implementation structure:

Initializer Agent

# initializer_system_prompt.md
Your role is environment setup. Do not write features.
Create the scaffolding that future coding agents will use.

Required outputs:
1. init.sh — script to reliably start dev environment
2. feature_list.json — specific, end-to-end feature descriptions
3. claude-progress.txt — initial empty progress log
4. Git commit with message "[init] Environment initialized"

Feature list requirements:
- 200+ specific features for a production web app
- Each feature: category, description, steps[], passes: false
- All initially marked failing
- No feature is "implement the app" — all are user-visible behaviors

Feature List Schema

{
  "features": [
    {
      "category": "authentication",
      "description": "User can sign up with email and password",
      "steps": [
        "Navigate to /signup",
        "Enter email and password",
        "Click submit",
        "Verify redirect to dashboard",
        "Verify user created in database"
      ],
      "passes": false
    }
  ]
}

Key design decision: Feature list as ground truth. Agents cannot infer completion from code. Must verify against explicit criteria.

Coding Agent Startup Sequence

# Standardized startup — executed at beginning of every session

1. pwd — confirm working directory
2. read claude-progress.txt — understand recent work
3. git log --oneline -20 — see recent commits
4. read feature_list.json — identify highest-priority incomplete feature
5. run init.sh — start development environment
6. run startup_test.py — verify application in working state
7. IF startup_test FAILS: fix before touching anything new
8. BEGIN work on one feature at a time

Session End Requirements

# Every session ends with:

1. Git commit with descriptive message
2. Update claude-progress.txt with:
   - What was worked on
   - What was completed
   - What state things were left in
3. Verify clean state (revert if needed)
4. Update feature_list.json if feature passes

Implementation 3: Mechanical Enforcement

OpenAI's approach: custom linters with remediation instructions formatted for agent consumption.

Architecture Linter

def check_layer_violation(file_path: str, import_path: str) -> Optional[str]:
    """
    Check if import violates layer architecture.
    Returns remediation message or None.
    """
    file_layer = get_layer(file_path)  # domain, service, api, etc.
    import_layer = get_layer(import_path)
    
    allowed = LAYER_RULES.get(file_layer, [])
    
    if import_layer not in allowed:
        return (
            f"Architecture violation: {file_path} ({file_layer})\n"
            f"imports {import_path} ({import_layer})\n"
            f"Allowed imports from {file_layer}: {allowed}\n"
            f"Fix: Move code to appropriate layer or use dependency inversion."
        )
    
    return None

Error Message Format

# Linter error format designed for agent consumption:

{
  "rule": "architecture.layer_violation",
  "violated": "api/routes.py imports domain/models.py",
  "constraint": "api layer may only import service layer",
  "remediation": [
    "Option 1: Move the required function to service layer",
    "Option 2: Create a service-layer facade that domain calls",
    "Option 3: Use dependency injection to break the coupling"
  ]
}

Key design decision: Error messages include remediation. Agents receive constraint, violation, and fix options in single context.

Implementation 4: Git Worktree Orchestration

Pattern for parallel agent execution without collision.

class AgentWorkspace:
    def __init__(self, task_id: str, base_branch: str = "main"):
        self.task_id = task_id
        self.worktree_path = f"/worktrees/{task_id}"
        self.branch = f"agent/{task_id}"
        
        # Create isolated worktree
        run(f"git worktree add {self.worktree_path} -b {self.branch}")
        
    def execute(self, agent_fn) -> Result:
        """Run agent in isolated workspace."""
        old_cwd = os.getcwd()
        try:
            os.chdir(self.worktree_path)
            result = agent_fn()
            
            # Validate before merge
            if self.all_checks_pass():
                return Result.success(self.commit_and_push())
            else:
                return Result.failure(self.get_errors())
        finally:
            os.chdir(old_cwd)
    
    def cleanup(self):
        """Remove worktree after merge or failure."""
        run(f"git worktree remove {self.worktree_path}")
        run(f"git branch -D {self.branch}")

Implementation 5: Application Legibility

OpenAI's investment in making the application observable to agents.

Browser Automation Integration

class BrowserTool:
    """CDP-based browser automation for end-to-end verification."""
    
    def navigate(self, url: str) -> str:
        """Navigate to URL, return DOM snapshot."""
        
    def click(self, selector: str) -> str:
        """Click element, return updated DOM."""
        
    def fill(self, selector: str, value: str) -> str:
        """Fill form field, return updated DOM."""
        
    def screenshot(self) -> bytes:
        """Capture screenshot for visual verification."""
        
    def assert_visible(self, text: str) -> bool:
        """Verify text is visible on page."""

# Agent usage:
# 1. Navigate to feature URL
# 2. Perform user actions (click, fill)
# 3. Assert expected outcome visible
# 4. Only mark feature passed if assertion succeeds

Observability Integration

class ObservabilityTools:
    """Query logs, metrics, traces from isolated agent task."""
    
    def query_logs(self, query: str, since: str = "1h") -> List[LogEntry]:
        """LogQL query against task-specific logs."""
        
    def query_metrics(self, metric: str, range: str = "1h") -> List[MetricPoint]:
        """PromQL query against task-specific metrics."""
        
    def query_traces(self, trace_id: str) -> Trace:
        """TraceQL query for distributed trace details."""

# Each agent task runs on isolated app instance with own observability.
# Data torn down after task complete. Agents debug like human engineers.

Summary: The Implementation Stack

Component	Pattern	Source
Capped search tools	Forces specificity	SWE-agent
Stateful file viewer (100 lines)	Removes cognitive load	SWE-agent
Linter at edit time	Catches errors immediately	SWE-agent
Two-agent split	Scaffolder + executor	Anthropic
Feature list as ground truth	Prevents fake-done	Anthropic
Startup sequence	Orient before work	Anthropic
Mechanical enforcement	Linters, not review	OpenAI
Git worktree isolation	Parallel execution	General
Browser + observability	End-to-end verification	OpenAI

Core principle: Model capability is a commodity. Environment design determines performance. The harness is the implementation surface.

Published March 18, 2026 — Prompt Engines Lab