LabNotes

A/B Testing with Agent-Built Variants

Traditional A/B testing follows a familiar rhythm: a designer creates two mockups, a developer implements them, they get deployed behind a feature flag, and two weeks later you have a statistically significant result. That entire cycle—design, development, deployment, measurement—now fits inside a single afternoon when you let an AI agent build the variants.

We tested this approach on StoryBook Studio, a product for parents to create personalized storybooks for their children. An agent built and deployed four distinct landing page variants in one session. Here's what we learned about the process, the results, and the patterns worth repeating.

The Traditional Bottleneck

Conventional A/B testing has a throughput problem. The bottleneck isn't the testing infrastructure—Vercel, LaunchDarkly, and Google Optimize solved split-testing mechanics years ago. The bottleneck is variant creation.

Each variant requires:

  • Copywriting: Distinct value propositions, headlines, CTAs
  • Design: Layout adjustments, visual hierarchy changes
  • Implementation: Frontend code, responsive behavior, asset optimization
  • QA: Cross-browser testing, mobile verification, link validation

Even with a dedicated team, producing 4 distinct variants for a single landing page typically takes 1–2 weeks. Most teams settle for 2 variants because the cost of producing more doesn't justify the marginal insight gained.

This is where agent-built variants change the economics.

The StoryBook Studio Experiment

For StoryBook Studio's marketing page, we defined four copywriting strategies and tasked an agent with building complete HTML implementations for each:

VariantStrategyCore Approach
ControlFeature-ledDirect product description. What it does, how it works, pricing.
EmotionFeeling-firstLead with the emotional moment. Parent-child bonding through stories.
Social ProofTestimonial-drivenUser quotes, review scores, adoption metrics front and center.
PASProblem-Agitate-SolveIdentify screen-time anxiety, amplify the concern, present the solution.

The agent received the same base requirements: Hero section, feature highlights, pricing information, footer. Each variant used a different copywriting framework to structure the same underlying information.

What an Agent Actually Does Differently

The agent doesn't just swap headlines. It restructures the entire page hierarchy based on the chosen strategy:

Control (Feature-led)

Opens with the product name and a clear value statement. Feature cards appear above the fold. Pricing is visible without scrolling. This is "here's what we built, here's what it costs, here's how to buy it." Low-friction, high-clarity.

Emotion Variant

The hero section opens with a parent-child story moment—no product mention until the second fold. The emotional hook comes first: "Your child's favorite story? The one where they're the hero." Product features appear only after the reader feels something. This reverses the Control's information hierarchy entirely.

Social Proof Variant

Testimonial quotes replace the hero copy. A review score appears in the navigation. The "5,000+ families" stat gets its own visual treatment. Features are still present, but they're framed as "why parents choose us" rather than standalone descriptions.

PAS Variant

Opens with the problem: kids spending hours on screens with passive content. Agitates by quantifying the impact—hours per week, lack of engagement. Then resolves with StoryBook Studio as the alternative. The page structure follows a narrative arc rather than a product catalog layout.

Each variant is a complete, independent implementation—not a template with variables swapped. Different markup structure, different visual emphasis, different content flow. That's what makes the comparison meaningful.

Deployment via Vercel

Deploying 4 variants simultaneously requires minimal infrastructure. The pattern:

# Each variant is a standalone HTML file deployed as a Vercel route
storybookstudio.promptengines.com/          → Control (default)
storybookstudio.promptengines.com/emotion   → Emotion variant
storybookstudio.promptengines.com/proof     → Social Proof variant
storybookstudio.promptengines.com/pas       → PAS variant

Vercel's edge network serves each variant at the same latency. No custom servers, no feature flag services, no client-side redirect logic. The variants are static HTML—no build step, no framework hydration.

Traffic splitting happens at the routing layer. For initial tests, we use a simple pattern: direct links to each variant for qualitative review, then weighted routing for quantitative measurement once the variants are validated.

Measuring What Matters

Not all metrics are equally useful for landing page variants. Here's the hierarchy we use:

  1. Primary: Click-through to signup/action. Did the visitor take the next step? This is the only metric that matters for conversion optimization.
  2. Secondary: Time on page + scroll depth. Are they reading, or bouncing? High time-on-page with low conversion suggests good copy but a weak CTA.
  3. Tertiary: Bounce rate by source. A variant that works for organic search traffic may fail for paid social. Segment by traffic source before declaring a winner.

What we don't track: Heatmaps for a 4-variant test are noise. Eye-tracking studies are premature. Focus groups at this stage introduce observer bias. Run the experiment first, do qualitative analysis on the winner.

Practical Patterns for Agent-Built Testing

From the StoryBook Studio experiment, several patterns emerged that apply beyond this specific case:

1. Define Strategies, Not Designs

Tell the agent the persuasion framework, not the visual layout. "Build a PAS variant" produces better results than "move the hero image to the left and change the headline color." The agent handles design decisions based on the strategic intent.

2. Keep Variants Structurally Independent

Each variant should be a complete, self-contained page. Shared components lead to correlated failures—if the shared CTA button is wrong, all variants fail together. Independence means cleaner signal.

3. Deploy Before You Perfect

Agent-built variants are drafts, not finals. Deploy them, measure for 48–72 hours, then refine the winner. The speed advantage of agent-built variants only matters if you actually ship them. Perfect is the enemy of tested.

4. Test Copy Strategy Before Visual Design

The 4 StoryBook Studio variants test messaging approaches. Visual differences are secondary to the copywriting framework. Once you identify the winning strategy, you can run a second round of tests on visual execution within that strategy.

5. One Variable Per Round

The first round tests copywriting strategy (Control vs Emotion vs Social Proof vs PAS). The winner of that round becomes the base for round two, which might test CTA placement, hero image style, or pricing presentation. Stacking variables produces ambiguous results.

What This Changes

Agent-built variants don't eliminate the need for A/B testing discipline. You still need statistical significance, clean traffic segmentation, and honest measurement. What changes is the cost of experimentation.

When producing 4 variants takes an afternoon instead of 2 weeks, the calculus shifts. You test more hypotheses. You run more rounds. You stop debating whether the emotion-first approach might work and just build it and see.

The bottleneck moves from variant creation to experiment design—which is where it should have been all along.

Next Steps

The StoryBook Studio variants are deployed and collecting data. Once we have statistically significant results on the copywriting strategy round, we'll publish the findings. The winning variant enters round two: visual execution testing within the proven copy framework.

The agent is standing by to build the next set.


Experiment Details
Product: StoryBook Studio
Variants: 4 (Control, Emotion, Social Proof, PAS)
Deployment: Vercel, static HTML routes
Build time: Single session (agent-built)
Measurement window: 48–72 hours per round

References:
StoryBook Studio: https://storybookstudio.promptengines.com
Vercel Deployment: https://vercel.com
PAS Framework: Problem-Agitate-Solve (copywriting methodology)