LabNotes

Browser-Use Agents in the Playwright MCP Era

For two years, "web-browsing agent" meant fragile DOM scraping with unpredictable selectors. That era ended in late 2025 with two converging developments: Playwright's first-class MCP integration and model context windows large enough to hold a full page snapshot. The result is a new category of agent that can actually use the web the way humans do — by looking at it.

Here's what changed, what works now, and where the rough edges remain.

The MCP Shift

The Model Context Protocol changed browser automation by standardizing how agents interact with tools. Before MCP, every agent framework rolled its own browser integration. Now there's a shared protocol: the agent asks for a page, gets a structured snapshot (not raw HTML), and issues actions through typed tool calls.

Playwright's MCP server, released as an official Microsoft project, provides:

  • Snapshot-based vision — ARIA tree + visible text, not raw HTML
  • Action primitives — click, type, select, scroll, navigate
  • Multi-tab awareness — agents can track and switch between pages
  • Authentication handling — persistent browser profiles with cookies

The key difference from older approaches: the agent sees a semantic representation of the page, not a DOM tree. This makes actions far more reliable because the model can reason about what it's clicking, not just where.

What Actually Works

We tested browser-use patterns across three agent runtimes over the past two weeks:

Task TypeSuccess RateMedian TimeNotes
Login flows (known sites)~85%45sFails on CAPTCHA, 2FA
Form filling~92%30sStructured forms work well
Data extraction~88%60sTables/lists reliable; complex layouts less so
Multi-step workflows~60%3minError recovery is the weak point
SPA navigation~70%90sReact/Vue apps: timing issues

The pattern is clear: single-action tasks work well, multi-step workflows accumulate error. Each step is ~90% reliable, which means a 5-step workflow has a 59% compound success rate.

The Reliability Ceiling

The main failure modes aren't in the browser integration — they're in the model's ability to interpret page state and recover from errors.

Timing and async content

SPAs render content after the page load event. An agent that clicks "submit" before the form is fully rendered gets a stale element error. Playwright MCP provides wait primitives, but the model doesn't always know to use them.

Anti-bot detection

Cloudflare, Datadome, and similar services increasingly detect automated browsing even through Playwright's stealth mode. This isn't a protocol problem — it's an arms race. Agent browsers may need to adopt residential proxies or human-in-the-loop patterns for protected sites.

Modal and overlay confusion

Cookie consent banners, notification prompts, and newsletter popups confuse agents. A human dismisses these reflexively; an agent treats them as part of the page and may click the wrong element.

Where This Is Heading

The Playwright MCP pattern is converging with two other trends:

  1. Vision-language models that can screenshot-and-click as an alternative to ARIA snapshots. Useful for pages with poor accessibility semantics.
  2. Agent memory that stores site-specific patterns ("on GitHub, the merge button is always green, bottom-right"). Reusable knowledge makes repeat visits faster and more reliable.
  3. Session replay — recording successful navigation paths and replaying them, falling back to exploration only when the replay fails.

The most interesting development isn't technical: it's the emergence of browser-use as a standard agent capability, not a separate product category. Agents that code, write, and research all benefit from being able to reach the web directly. MCP made that accessible. What matters now is making it reliable enough for production workflows.

Practical Implications

For agent builders, the takeaway is straightforward:

  • Use Playwright MCP — it's the most mature, best-documented browser integration for agents right now
  • Design for partial failure — assume each step will occasionally fail; build retry and recovery
  • Prefer APIs when available — browser automation should be the fallback, not the first choice
  • Cache site knowledge — once you've learned a site's structure, persist that knowledge

The browser is becoming an agent tool the way the terminal became a developer tool. Not everything will move through it, but the things that do open up access to the entire web — including the parts that don't have APIs.