Posted April 23

Contextual bandits: The next step in personalization

Jett Sy
by Jett Sy
8 min read time

When it comes to personalization, we're in competitive times. We're in hard-to-please times. We're in attention-span-of-a-goldfish times.

Everyone knows personalization matters; that's not news. But how do you deliver truly relevant experiences that drive conversions without wasting resources? That's where contextual bandits come in.

By selecting relevant user attributes, contextual bandits automatically learn which experiences work best for different audiences, providing valuable insights while maximizing conversions throughout its runtime.

What is a contextual bandit?

The term "multi-armed bandit" comes from the classic slot machine analogy (the "one-armed bandit").

Imagine a casino with multiple slot machines. Which one do you play to maximize your winnings? That's the basic challenge.

Contextual bandits take this to the next level by factoring in who's pulling the lever. They leverage user data to make better algorithmic decisions and deliver 1:1 personalization. The machine learning model balances the impact on your primary metric with the data it has about each visitor (the context).

A contextual multi-armed bandit serves the best-performing variation for every visitor based on their unique profile at that specific moment. This varies for different visitor profiles as the goal is to drive maximum impact for every visitor in each session.

Instead of tediously mind-mapping every variation to different user archetypes (a.k.a manually configuring static rule-based targeting), you can rely on the contextual bandit to make those decisions more accurately for you.

What happens after your contextual bandits runs?

For optimization managers, that means knowing which audiences to target next. For analysts, it means the data behind every decision is visible, auditable, and tied directly to conversion outcomes.

It also means a deeper understanding of which visitor attributes drive conversions and the specific values within those attributes that are actually moving the needle.

Mobile or desktop?

Facebook traffic or organic?

You now see the answer at the variation level, not just the attribute level.

So a gaming company can identify what attribute values are impacting each variation. A financial services company can see which segment of visitors converted on which products. A retail customer can move from knowing a segment is impactful to knowing what to do next.

Optimizely's own marketing team can know not just that Traffic Source matters, but which sources (Facebook, organic, direct, etc.) are driving results, informing whom to target next. An apparel company can know not just that device type has an impact, but which device type.

Contextual bandits results page

Image source: Optimizely

The updated results page is organized into three sections.

  1. The summary bar at the top shows at-a-glance metrics. Overall revenue, total visitors, improvement percentage, and primary metric.
  2. The user attributes section shows a ranked list of attributes that influenced the model's traffic distribution, with an attribute type filter to toggle between contributing and other attribute categories.
  3. Each variation row is now expandable, showing the contributing attribute and attribute value columns when opened.

Example

A financial services company runs a CMAB campaign across three credit card product variations. The campaign reaches maturity.

Previously, the results page might have showed that age_group and annual_income_group were contributing attributes for the Bronze Credit Card variation. However, it didn't say which age group or which income bracket.

Now, expanding the Bronze Credit Card variation row might show:

  • age_group → 21–30
  • annual_income_group → $30,000–$45,000
  • own_credit_card → Yes

This tells the customer precisely who responded to that experience — giving them the foundation to build a targeted personalization campaign for those exact segments.

From multi-armed bandits to contextual bandits...

What makes contextual bandits different from multi-armed bandits? Context.

Traditional MABs look for a single best-performing variation for all users, while contextual bandits identify winning variations based on user profiles such as device type, location, behaviors, purchase history, and more.

Let's compare:

  1. A/B testing: Fixed traffic allocation where visitors are randomly assigned to different variations, with each person seeing only one experience while waiting for statistical significance.
  2. Multi-armed bandits: Optimizes for a single best-performing variation. Shifts traffic dynamically but seeks one"winner."
  3. Contextual bandits: Personalizes for individual users based on context. Different users get different experiences based on what's most likely to convert for their profile.

Every missed optimal experience is a lost conversion opportunity. With A/B testing, winners are generalized from a specific segment. MABs improve this but still seek the best variation for everyone.

Contextual bandits serve each visitor the best variation for them at that moment. When profiles change, the relevant variation changes too. If a visitor converts on a product, they'llsee a related product on their next visit, not the same one, increasing the chances of converting again.

How contextual bandits work

Contextual bandits balance the impact on your primary metric and user attributes to dynamically distribute the most relevant variation to each visitor at that specific moment.

Here's a simplified explanation:

  1. Learning period: The model starts with 100% exploration, randomly assigning variations to visitors to gather diverse data for predictions.

  2. Balancing exploration and exploitation: Once enough visitor behavior data is collected, the model begins exploiting (serving personalized variations). It dynamically adjusts exploration and exploitation rates as it receives more events.

  3. Continuous adaptation: The model maintains some exploration (maximum 95% exploitation) to ensure continuous learning and avoid missing opportunities.

Selecting the right primary metric is critical as the impact on it influences the model distribution. It's suggested to be tracked as close to where the contextual bandit is running, ideally on the same page.

User attributes are equally crucial. The more complete your set of attributes (products purchased, viewed, categories browsed, etc.), the better your model will perform. Optimizely's model supports unlimited attributes out of the box from standard (client-side), custom (API), and external (third-party) sources.

Contextual bandits use cases

Here are examples of wider industry applications:

  • Retail: Homepage product carousels personalized by shopping frequency and purchase history.
  • Media: Homepage content suggestions (sports, series, movies) based on viewing habits and devices.
  • Software: Dashboard feature highlights tailored to user role and usage patterns.

However, any real examples, you ask?

Our beta participants are already implementing and seeing results:

  • A financial services customer is using homepage contextual bandits to deliver relevant banking products based on customer history.
  • A pizza restaurant chain is using checkout page contextual bandits to suggest add-on items based on cart contents.
  • A telecommunications company is using profile page contextual bandits to present upsell offers based on current subscriptions.

The digital team at Optimizely is also using contextual bandits. They're using CMABs on our homepage to match visitors with products based on their company, role, industry, and location.

Here are some initial results:

  • 13.62% higher engagement with targeted content
  • 3.37% improvement in marketing planning
  • 20.79% improvement in validation with testing

The team says it's working well across the board.

Benefits of implementing contextual bandits

CMABs deliver substantial business value by:

  • Providing truly personalized experiences for every user: Instead of one-size-fits-all approaches, CMABs deliver the right content to the right person at the right time.
  • Increasing conversion rates on primary metric: By showing users what they're most likely to respond to, CMABs drive higher engagement and conversion.
  • Adapting dynamically to changes in visitor behavior: The system serves the best variation in every session, even as user preferences evolve.
  • Eliminating opportunity costs from traditional testing: Unlike A/B tests that require weeks or months to reach statistical significance, contextual bandits start optimizingimmediately, reducing exposure to underperforming variations in real time.
  • Requiring minimal maintenance: CMABs are ideal for pages where content doesn't change too frequently. As time goes on, the ML model gets sharper with the data it collects, making this a set it and forget it optimization that can be left running continuously.
  • Deeper customer intelligence: A CMAB doesn't just optimize while it runs. It teaches you about your audience. The results it generates are a direct input to your next personalization campaign. The algorithm's gains don't have to stay locked inside the CMAB, as they can inform static campaigns that extend those results to the segments you now know converted.

Optimizely's contextual bandit implementation: What makes it different

Here's how we're doing things differently:

  1. Advanced tree-based models: We've developed models for both binary classification and regression tasks, making our system flexible and adaptable to different types of data and experiment setups.
  2. Feature importance insights: Our system measures attribute impact and displays feature importance, providing insights on which attributes drive conversions.
  3. Dual-model and incremental learning: We handle all prediction types with specialized models that continue learning from new data without starting from scratch.
  4. Dynamic feature processing: Our preprocessing automatically converts features and handles data issues. Using XGBoost, we build multiple simple trees that learn from mistakes instead of one complex tree, preventing overfitting through regularization and other techniques.
  5. Integration with the broader ecosystem: Our CMAB implementation works seamlessly with Optimizely's experimentation and personalization suite, making it easy to elevate your strategy without additional tools or complexity.

The future of personalization is contextual

A CMAB isn't the end of the workflow. It's the input to the next one. Run it to discover which segments respond to which experiences. Then take what you learned and build static personalization campaigns targeting those exact audiences.

Ready to explore how contextual bandits can help you drive higher engagement, conversion rates, and customer satisfaction?

Check out this 2 minute Navattic tour to see what contextual bandits look like in the platform.

  • Last modified: 6/2/2026 10:03:51 AM