A type 2 error is a statistics term used to refer to a type of error that is made when no conclusive winner is declared between a control and a variation when there actually should be one.
When you’re performing statistical hypothesis testing, there are two types of errors that can occur: type I errors and type II errors.
Type I errors are like “false positives” and happen when you conclude that the variation you’re experimenting with is a “winner” when it’s actually not. Scientifically, this means that you are incorrectly rejecting the true null hypothesis and believe a relationship exists when it actually doesn’t. The chance that you commit type I errors is known as the type I error rate or significance level (p-value)--this number is conventionally and arbitrarily set to 0.05 (5%).
Type II errors are like “false negatives,” an incorrect rejection that a variation in a test has made no statistically significant difference. Statistically speaking, this means you’re mistakenly believing the false null hypothesis and think a relationship doesn’t exist when it actually does. You commit a type 2 error when you don’t believe something that is in fact true.
Statistical power is the probability that a test will detect a real difference in conversion rate between two or more variations.
The most important factor determinant of the power of a given test is its sample size. The statistical power also depends on the magnitude of the difference in conversion rate you are looking to test.
The smaller the difference you want to detect, the larger the sample size (and the longer the length of time) you require.
Marketers can easily underpower their tests by using a sample size that is too small.
That means that they have a slim chance of detecting true positives, even when a substantial difference in conversion rate actually exists.
In A/B testing, there is a balance to strike between speed of test data and certainty in results accuracy. One way to solve this problem is to run a test for a longer period of time to increase its sample size and hopefully reduce the probability of a type 2 error.
One reason to watch out for type 2 errors is that they can hinder your customer conversion optimization cost in the long run.
If you fail to see the effects of variations in your alternative hypotheses where they actually exist, you may be wasting your time and not taking advantage of opportunities to improve your conversion rate.
Let’s consider a hypothetical situation. You are in charge of an ecommerce site and you are testing variations of a landing page. We’ll examine how a type 2 error could negatively impact your company’s revenue.
Your hypothesis test involves changing the “Buy Now” CTA button from green to red will significantly increase conversions compared to your original landing page. You launch your A/B test and wait for the random sample of data to trickle in.
Within 48 hours, you discover that the conversion rate for the green button is identical to the conversion rate for the red button (4.8%) with a 95% level of significance.
Disappointed, you declare the green button a failure and keep the landing page as it is.
The following week, you read an article about how green buttons are boosting conversion rates. You decide to try out your hypothesis again. This time, you wait two weeks before checking your results.
Eureka! You discover that the green button has a 5% conversion rate compared with the 4.8% of the red button and has statistical significance. It turns out that you committed a type 2 error because your sample size was too small.
While it is impossible to completely avoid type 2 errors, it is possible to reduce the chance that they will occur by increasing your sample size. This means running an experiment for longer and gathering more data to help you make the correct decision with your test results. This will help avoid reaching the false conclusion that an experiment does not have any impact, when it actually does.
Another way to help prevent type 2 errors is to make big and bold changes to your webpages and apps during experiments. The larger the effect of a change, the smaller sample size you will require and the smaller the chance that you will not notice a change. A 25% increase in conversion rate is much easier to notice than a 0.001% increase.