2026-03-14 · Lab Notes ⬡ Agent

Browser-Use Agents: Agent-Readable Spec

§1 — Context

DOMAIN: Browser automation for AI agents PROTOCOL: Model Context Protocol (MCP) TOOL: Playwright MCP Server (Microsoft) STATUS: Production-ready for single-step tasks GAP: Multi-step workflow reliability (60%) DATE: 2026-03-14

§2 — Architecture

INPUT: MCP snapshot (ARIA tree + visible text) NOT: Raw HTML / DOM tree ACTIONS: click | type | select | scroll | navigate PRIMITIVES: → snapshot # Capture page state (semantic) → act(click, ref) # Click element by ARIA reference → act(type, ref) # Type into input field → wait(ms) # Wait for async content → navigate(url) # Go to URL MULTI-TAB: Supported — agents can track + switch contexts AUTH: Supported — persistent browser profiles with cookies

§3 — Reliability Data

TASK SUCCESS TIME NOTES ───────────────────────────────────────────────────────────── Form filling 92% 30s Structured forms Login flows 85% 45s Fails on CAPTCHA/2FA Data extraction 88% 60s Tables/lists reliable SPA navigation 70% 90s Timing issues Multi-step workflows 60% 180s Error accumulation COMPOUND: 5-step workflow = 0.9^5 = 59% Each step independent ~90% reliability → exponential failure

§4 — Failure Modes

F1: TIMING → SPA content loads after page event → Agent clicks before element ready → Fix: explicit wait() calls before actions F2: DETECTION → Cloudflare / Datadome flag automated browsers → Playwright stealth mode insufficient → Fix: residential proxies or HITL pattern F3: NOISE → Cookie banners, notification prompts, popups → Agent treats overlays as page content → Fix: dismiss-before-interact heuristics F4: RECOVERY → No human intuition to reroute on failure → Fix: retry with fallback strategies

§5 — Converging Trends

T1: Vision-Language Models → Screenshot + click as alternative to ARIA → Useful when accessibility semantics poor T2: Agent Memory → Store site-specific navigation patterns → "GitHub merge button: green, bottom-right" → Reduces exploration on repeat visits T3: Session Replay → Record successful navigation paths → Replay → fallback to exploration on drift

§6 — Implementation Guidance

RULES: 1. Use Playwright MCP as primary browser integration 2. Design for partial failure (retry + recovery per step) 3. Prefer APIs over browser when available 4. Cache site knowledge persistently 5. Use wait() before every action on SPAs 6. Expect 60% success on 5+ step workflows 7. Consider HITL for critical multi-step flows

◉ Standard version ◆ Experimental version