a screenshot of a computer

Feature Flag Experiment Promo Card


🚩 How do we use Experiment Feature Flags at Optimizely? Allow us to perform A/B/n tests against our deployed features
🔒 Access Level? Only Optinauts that built the experiment can change the flag, but visible change to the entire team
😬 Risk Level? Low
👩‍💻 Tests? Unit + Integration Tests
Lifetime? When experiment concludes

Experiment feature flags allow us to perform A/B/n tests against our deployed features. This allows us to make data-driven decisions about the features we build and prioritize. During an experiment (also referred to as feature tests), users that are assigned into an experiment send Optimizely an impression event which is recorded to test results. We can perform simple tests such as feature on vs feature off or much more complicated multivariate testing against a multitude of different combinations.

How do we use Experiment Feature Flags at Optimizely?

At Optimizely, multiple teams are constantly running experiments with feature flags. Here are a few examples from across engineering, product and design:

  • The product team wants to test a new feature. They can set traffic at a 50/50 split and see if the new feature improves a certain metric over the current version.
  • The design team wants to determine which new UI will drive the most conversions for a new sign up page they want to launch. They can experiment with 2-3 designs to determine which performs best for users in a data driven way.
  • The infrastructure team wants to see if a new optimization they’ve built actually gives measurable improvement over latency in a real-world environment. They can run their experiment in production and validate if there is an improvement, and if that improvement is statistically significant.

Who is allowed to make changes to the Experiment Feature Flag?

The team that creates the feature flag should be the only team that ends the experiment or makes any changes to their experiment feature flag. However, when a team at Optimizely is launching an experiment, the general communication about that experiment is made to the entire company, you can learn more about that process here. This gives teams across the company visibility that an experiment may be modifying the UI or user flows as they are using and maintaining Optimizely. Since the feature test is deployed through a feature flag, if it does cause a customer issue, it can easily be disabled by the flag owner or on call engineer. 

What is the risk level for using an Experiment Feature Flag?

Beaker Fire Gif

Experiment feature flags are low risk flags. In general, you are looking to test to see if there is an improvement or change over the original, and if we hit statistical significance or don’t see a significant change, we end the experiment.

How does Optimizely test its deployment of Experiment Feature Flags?

Optimizely has robust Unit Tests and Integration Tests (Cypress, Selenium, etc) as described here

In terms of end to end test, the automation team may not necessarily build automation against the feature being experimented on. As this experiment feature flag may not become a permanent feature, it may not warrant the cost to build it into the regression automation framework. However, before launching the experiment, the teams identify the most critical variation that needs to be verified, and execute those tests manually.

We also use a few different techniques to contain the “blast radius” of an experiment introducing bugs:

  • During the duration of the experiment, our team monitors more closely error logging or reporting
  • Rolling out the experiment to only a smaller subsection of users and then slowly rolling out to more
  • Target the experiment to internal users first and dog food the experiment before turning it on more widely to the public

When does Optimizely remove its Experiment Feature Flags?

Feature Test Results

A completed feature test experiment with a clear loser (stat sig)

Once the experiment has concluded, if the feature we are testing is a winner (statistically significant), we will change that feature to a feature rollout. The feature rollout will have its own exit criteria (usually if the winner is stats sig). If the feature is not a winner (How long to run an experiment?), we assign the engineering owner the responsibility to remove the flag and conclude the project.

This is the part of a series that will dive more into the different feature flag types we have here at Optimizely, as well as the engineering teams that implement them. 

If you’re looking to get started with or scale feature flags, check out our free solution: Optimizely Rollouts

Are you using an experiment feature flag at your company? I’d love to hear about your experience! Twitter, LinkedIn