You can order shoes online and have them at your doorstep in minutes instead of days. You can hop on a plane in London and be in Venice in a matter of hours. And in each instance, those providing the service can estimate pretty accurately how long you can expect it to take to get what you need. In your experimentation program, you can’t properly plan or roadmap until you know how long your experiments will take to achieve statistical significance. So experiment duration affects your resourcing plans, not to mention your product and service rollout timelines.
To determine how long an experiment will need to run to achieve statistical significance, you’ve got to know your sample size, or the number of people who will be exposed to your experiment. Previously, you would need to figure out your sample size manually, deciding on your baseline conversion rate and minimum detectable effect. Then you’d use a Sample Size Calculator before starting your experiment to determine what your sample size should be, helping you project how long you wanted your experiment to run.
Now, Optimizely offers a Stats Engine designed to use a process called sequential testing that collects evidence as your test runs to flag when your experiment reaches significance, showing you winners and losers as quickly and accurately as possible.
However, the Sample Size Calculator is still useful in helping you determine the projected stopping point for an experiment. By determining a projected sample size needed to hit Statistical Significance (also known as “stat sig”) in each variation, you can compare this number with your traffic and determine about how long it would take to reach that number.
For example, say you design an experiment with four variations. Your site averages 10,000 unique visitors per week. If your baseline conversion rate is 15%, and if you’d like to measure stat sig to 95%, and if you’d like to detect a minimum of 10% lift, also known as your Minimum Detectable Effect, you’d need roughly 8,000 visitors per variation to reach stat sig. So, estimating 8,000 visitors per variations times four variations, you’d need 32,000 visitors to reach stat sig, and that’s if you run your experiment to all traffic, without any audience targeting. It would take about 3.2 weeks to get to that point.
If you’re currently not sure about your Baseline Conversion Rate, your analytics platforms can usually help you calculate that number. You can also run a monitoring campaign in Optimizely to help you determine this number, where you simply let an experiment run on your site for a predetermined period of time, without creating a variation. This type of monitoring campaign is referred to as an A/A experiment.
Sometimes you’ll sacrifice waiting for statistical significance because of your business needs. For example, if you’re targeting a feature release next month, you might need results in two weeks instead of the three and a half projected to reach stat sig. Consider the trade-off between a quick turnaround for results and the accuracy of those results. If the impact of incorrectly calling a winner is low, you may make the decision to call an experiment completed before it reaches statistical significance so you can get results more quickly. But if you’re making an important decision that requires high accuracy, it’s important to let your experiments achieve significance.