The agentic AI experimentation report

AI is stealing the first touch.

Buyers compare options in AI tools. By the time someone lands on your site, they're already informed. Ready to buy.

It is why every experience matters because fewer searches end with a click.

But improving those experiences takes work. Every experiment needs research, hypothesis validation, variation development, execution, monitoring, and analysis. Each step waiting on someone.

58.74%

of all agent usage is experimentation.

So, you compromise. Test only the biggest ideas. Let smaller opportunities pile up. Wait for conclusive results.

AI can help is not news. However, you've tried AI and probably run into a challenge we're seeing across brands:

How to make AI work reliably, at scale, and with accountability and governance?

To find an answer to this problem of scaling with AI, we analyzed data across 47,000 Optimizely Opal interactions and 900 companies.

About our research

Our insights are drawn from three primary data sources:

Internal Optimizely Opal data: Analyzed usage data across 47,000 Optimizely Opal interactions from customers who adopted Optimizely Opal after its public launch, covering feature adoption patterns, user behavior, and measurable performance outcomes.
Customer stories from early adopters: Firsthand insights from nearly 900 Optimizely Opal adopters, highlighting their implementation experiences, challenges overcome, and the tangible impacts achieved.
Industry research: Third-party researchhird-party research and surveys from leading firms like McKinsey and Gartner, providing external context around broader AI adoption trends and transformations in marketing.

900 companies. 47000 Optimizely Opal interactions. Does AI work for experimentation?

58.74% of all agent usage is experimentation.

Teams using agents across the full lifecycle aren't stuck with the trade-off anymore. They're getting both. More experiments and higher win rates.

+ 78.66% created experiments
+ 2.38% concluded experiments
+ 24.05% created personalization campaigns
+ 11.97% concluded personalization campaigns
+ 9.26% win rate
+ 1.38% conclusive rate

So, what are these teams doing differently?

They're realizing that the real impact of AI experimentation is in using agents across the full experimentation lifecycle to improve completion.

10.95%

of experiments start with agent-generated ideas

6.8% get summarized by agents.
19.54% are follow-ups driven by agent recommendations

Creation spikes first. As teams bring AI agents into execution and analysis, conclusion rates and win rates follow.

This is how they work now

Top 10%: 60+ interactions
Top 1%: 198+ interactions

Industries: Retail (17.4%), Software (11.8%), Financial Services (9.6%). Also, Healthcare, Education, and Insurance.

Who's doing this: They're not all advanced teams with sophisticated stacks. 37% are mid-maturity. 12% are early stage. They started where you are now.

All these fantastic results and yet brands can’t scale AI outcomes. Here’s why.

Everyone has AI. Few have the system to scale it.

Almost everyone is using AI. Almost no one can scale its implementation.

80% remain stuck in pilot or have seen no significant gains. Only 8% consider themselves advanced with AI.

Most teams use AI for individual tasks. Draft faster. That helps individuals but it doesn't change how the program runs.

More usage = better completion, not just more starts

Our data shows the uplift slope for concluded experiments is steeper than for created experiments. More usage doesn't just mean more tests started. It means more tests reaching actionable outcomes.

Why? Agents handle the operational steps that cause tests to stall mid-cycle:

Turning ideas into structured plans, so tests don't wait for someone to write them up
Building variations, so tests don't sit in dev queues
Summarizing results and recommending next steps, so insights don't die in decks

Robinson Club built this kind of system. Michael Richter trained agents to generate production-ready landing pages, audit content for consistency, and benchmark competitors, turning tasks that took days into seconds.

€4M+

in proven revenue from experimentation

The result: €4M+ revenue from experimentation, scaled across five TUI brands and ten languages. Not by adding headcount. By embedding agents across the lifecycle.

Talking to those using Agentic AI, we realized the teams that struggle share three problems:

No defined process: Agents without workflows produce inconsistent results. Different people, different outputs. Usage depends on whoever remembers to open the tool.
No orchestration: Tools that don't connect create more handoffs, not fewer. You save time in one step, lose it coordinating the next.
No governance: Quality varies by who's prompting. No checkpoints. Leadership worries about what's actually shipping.

The teams in that data fixed all three. They stopped treating AI as a tool and started embedding it across the workflow.

Here's what that looks like.

AI is stuck at the individual productivity level

An idea comes in through a request form. It goes through feasibility. Gets prioritized against the backlog. Someone writes the experiment brief. Development picks it up. QA checks it. It launches. Results come in. Someone analyzes. Decides what's next.

Where it stalls:

Ideas wait for the one person who writes briefs
Development takes 2 sprints per variation, plus 2 days lead time just to get prioritized
Each variation needs 0.5 hours of manual accuracy checking
Analysis depends on whoever has time that week
Follow-up ideas sit in Slack threads

If that looks familiar, you'll recognize these too:

Approval loops: The same deliverable bouncing back and forth. Review comments that contradict last week's feedback.
Ownership gaps: Ideas floating without owners. Results shared in Slack, follow-ups never created. Winning tests that don't get scaled because no one's job is to scale them.
Inconsistency: Same brief, different outputs depending on who picks it up.
Lack of trust: AI models can hallucinate, misinterpret context, and change behavior over time. Agent decisions are hard to trace, so it feels like a black box each time it is run.

These inefficiencies are the reasons AI stays stuck at individual productivity instead of changing the program.

To solve this, Optimizely Opal AI agents now orchestrate use cases across your workflows and scale your impact without scaling costs.

Stage	What was happening	Workflow agent
Ideation	5-10 ideas per month, limited by one person's capacity	Ideation Agent generates 2-5 testable ideas per request
Brief	Days to write a structured plan	Planning Agent structures in seconds
Development	2 sprints + 2 days lead time per variation	Variation Agent builds without the queue
QA	Manual checking, 0.5 hours per variation	Review Agent flags issues before launch
Analysis	Waiting for someone to write it up	Summary Agent recommends next steps
Exploration	SQL queries, analyst dependency	Data Query Agent answers in plain English

And here’s how you can activate Optimizely Opal agents across your experience optimization workflow:

AI agents across the full experimentation lifecycle

Image source: Optimizely

What makes workflow agents different

You've used AI to draft a hypothesis. Summarize a result. Brainstorm ideas.

Imagine a test concludes. The agent sees it. Summarizes the result. Identifies the pattern. Generates follow-up ideas grounded in your historical data. Drafts plans in your format. Queues them for review.

50% more output

By adopting workflow agents (Optimizely Opal AI report)

You didn't ask. It just ran.

A chatbot responds when you prompt it. A workflow agent executes. A chatbot starts fresh every time. A workflow agent remembers.

Agents rely on two types of memory:

Session context: The current test. Recent results. What you've shared.
Organizational knowledge: Your frameworks. Past learnings. What "good" looks like for your team.

How to build AI agents for experimentation

Image source: Optimizely

Optimizely Opal now has experimentation context built in. It knows your existing experiments, metrics, feature flags, and program history.

"I think Opal has all of the context within Optimizely, which is really helpful. It knows what the experiment is. We can actually feed it our website, and it can generate some great ideas of what tests we should be running."

Anonymous, Digital Personalization Manager

Global retail conglomerate, apparel and accessories

That means the ideas it generates aren't generic, they build on what you've already proven. The test plans it creates follow your format. The insights it surfaces connect to patterns across your program, not just one-off results.

Learn the principles for designing workflow agents

Without context, you get generic outputs that need heavy editing. With it, you get results grounded in your business. Consistent across your team. Improving as more work flows through.

Experience optimization is built on connected steps.

One Fortune 500 financial services company dramatically accelerated their experimentation program after adopting Optimizely Opal. In the year before adoption, their experimentation volume was modest due to switching systems. But in the following year— with access to Optimizely’s advanced experimentation platform and the added power of Optimizely Opal AI—their team increased experiment velocity by nearly 80x.

And this is why, at Optimizely, our methodology is grounded in Human Centered Design. We started by mapping real-world experimentation processes, identifying genuine friction points, and designing solutions that complement, not disrupt existing workflows.

Since its launch in May 2025, nearly 900 companies have adopted Optimizely Opal to embed AI throughout their marketing workflows. Top adopters include Diligent, Robinson Club, Elite Hotels of Sweden, and Road Scholar, representing over $2B in annual revenue.

As a one-person team, every hour matters. Optimizely Opal doesn’t just save me time—she delivers valuable insights within minutes. Using our frameworks, she provides ideas and recommendations that align perfectly with our experimentation goals.

Michael RitchterManager Conversion Optimization & UX | E-Commerce TUI Hotel brands

Why this works

Experiment context now lives in Optimizely Opal. Not just past results. Active tests.

What's running. What hasn't concluded. You can ask:

"What tests haven't concluded yet?"

"Should we pause this one?"

"Give me test ideas for this page."

Optimizely Opal shows what you want to see. Further, the system has cost, value, and trust designed in from the start.

Cost: AI behaves like infrastructure, not software. Usage scales, costs scale. Optimizely Opal is built to deliver value worth the compute, not rack up tokens on tasks that don't move the needle.
Value: Speed, automation, and fewer errors are not always value providers. Solving the right problem is. Optimizely Opal agents are mapped to real workflow bottlenecks in your experience optimization, not generic use cases. If it doesn't save time or improve outcomes, it doesn't ship.
Trust: Models hallucinate. Agents make decisions hard to trace. Trust isn't a setting you toggle. It's guardrails, verification, and human checkpoints built into the system. You choose your level of autonomy. You control what ships. AI handles the work that gets you there.

So, we've designed our workflow agents to keep things in your control.

Model	How it works	Use when
Agent-assisted	Agent supports, you control	Exploring new test areas
Human-in-the-loop	Agent completes step, waits for approval	Strategic decisions, brand-critical tests
Human-on-the-loop	Agent runs, you monitor	Established test patterns
Full automation	Agent handles end-to-end	Low-risk, high-volume (e.g., basic personalization rules)

AI workflow bottlenecks: Start where it hurts most

Not every bottleneck is worth solving first. Track two levers:

Productivity: Tests created per month. Time from idea to launch. Analyst hours per test.
Impact: Win rate. Conclusive rate. Revenue per test.

In business terms: Hours saved become FTE capacity freed. More tests become faster learning. A better win rate becomes incremental revenue.

Score each potential use case:

Priority Score
= Productivity + Growth - Effort

Productivity (1-10): How much time and effort does this save?
Growth (1-10): Does this improve output quality, volume, or speed?
Effort (1-10): How hard is this to implement?

Workflow agents in action

Most AI sits outside your workflow. Copy. Paste. Prompt. Copy. Paste.

Optimizely agents work inside the tools you already use. They know your past tests. Your performance data. Your frameworks. When they produce something, it's already where it needs to be.

Optimizely Opal across experimentation lifecycle

1. Experiment ideation agent

18%

more tests created. 33% faster run times.

Run more tests without adding headcount.

This agent draws on patterns from 127,000+ experiments. Paste a URL, share your goals, get ideas grounded in what's worked.

2. Personalization agent

Surface high-value personalization opportunities by segment.

This agent analyzes behavior patterns. Shows you which audiences respond to what. Identifies where targeting will actually change outcomes.

3. Experiment planning agent

19%

faster to start experiments. 25% faster to stats sig.

Hypothesis to launch-ready plan in seconds with the planning agent.

Audiences. Primary and secondary metrics. Guardrails. Run time to statistical significance. What used to take an hour of alignment across analyst, PM, and engineer.

Planning agent.gif

4. Variation development agent

Stop waiting for dev. Your idea is in the backlog. Revenue sits untested. Click the element. Describe what you want. Watch it write the code.

No queue. No dependency. No waiting for the next sprint with the variation development agent.

5. Experiment review agent

Catch setup problems before you waste traffic.

About to launch. Is targeting right? Are metrics configured correctly? The agent reviews your setup in real-time. Flags issues. Suggests fixes.

The experiment review agent is the extra pair of eyes your team needs.

6. Results summary agent

Results are in. The experiment summary agent reviews your metrics. Generates a summary. Recommends next steps: ship, extend, or try something new.

Results summary agent.gif

No more insights sitting in dashboards waiting for someone to write them up.

7. AI exploration generator

Ask a question in plain English. Get the dashboard.

graphical user interface, application, website

No SQL. No waiting for the analyst. Just the answer with AI in analytics.

Concluding remarks

AI is already changing how customers find you. The teams that win will be the ones who optimize experiences for every visitor who does land.

Teams embedding AI across the full lifecycle are running 78% more experiments and seeing 9% better win rates. They’re not just running more tests. They are learning faster than everyone else.

You can keep using AI for one-off tasks. Or you can connect with us to:

Map your workflow and find the bottlenecks
Match agents to the steps that stall
Start where impact is high and effort is low

Get in touch!

The agentic AI experimentation report

The agentic AI experimentation report

About our research

900 companies. 47000 Optimizely Opal interactions. Does AI work for experimentation?

Everyone has AI. Few have the system to scale it.

AI is stuck at the individual productivity level

What makes workflow agents different

Why this works

AI workflow bottlenecks: Start where it hurts most

Workflow agents in action

1. Experiment ideation agent

Transcript

2. Personalization agent

3. Experiment planning agent

4. Variation development agent

Transcript

5. Experiment review agent

Transcript

6. Results summary agent

7. AI exploration generator

Transcript

Concluding remarks

Scale your experimentation program with AI agents