What is improvement and statistical significance?

Statistical significance is a measure of how confident you can be in your results at that point in time. It can help you decide whether to keep the test running or to call a variation a winner and launch it to your site. This video explains how Stats Engine calculates statistical significance, and how you can change the significance level to suit your business' tolerance for risk.


Optimizely won't declare winners and losers until you have at least 100 visitors and 25 conversions on each variation, and more commonly you'll see results once Optimizely has determined that they are statistically significant. What's that? Read on.

Statistical Significance represents that likelihood that the difference in Conversion Rates between a given variation and the baseline is not due to chance. Your statistical significance level reflects your risk tolerance and confidence level. For example, if your results are significant at a 90% significance level, then you can say that you are 90% confident that the results you see are due to an actual underlying change in behavior, not just random chance.

Why is this necessary? Because, in statistics, you observe a sample of the population and use it to make inferences about the total population’s underlying behavior. In Optimizely, this observation is used to calculate the Improvement metric.

There’s always a chance that what you observed doesn’t reflect the actual underlying behavior. For example, if you set a 80% significance level, and you see a winning variation, there’s a 20% chance that what you’re seeing is not actually a winning variation. At a 90% significance level, the chance of error decreases to 10%. The higher your significance, the more visitors your experiment will require. The highest significance that Optimizely should display is >99%, as it is technically impossible for results to be 100% significant.

As you might have guessed, Improvement and Statistical Significance are related. If your variation causes a dramatic shift in visitor behavior, and leads to a higher Improvement percentage, you will probably see the experiment achieve significance more quickly. On the other hand, if Optimizely is trying to detect a smaller, more subtle change in conversion rates, it will probably need evidence from more visitors to declare significant winners or losers.

So rather than run a subtle test that targets a 1% improvement, you should try more impactful tests that are more likely to make a bigger impact (and thus require fewer visitors).