Using Research to Drive Successful Experimentation
If you are conducting A/B tests on a regular basis, you will quickly learn that not every test is a hit. In fact it can take up to 6-10 (or even more) experiments to finally reach positive results for the first time. And that’s okay. However you can greatly increase your chances of returning positive
If you are conducting A/B tests on a regular basis, you will quickly learn that not every test is a hit. In fact it can take up to 6-10 (or even more) experiments to finally reach positive results for the first time. And that’s okay. However you can greatly increase your chances of returning positive lift if you dig into your qualitative and quantitative data first.
In this case study, I’ll show you how we at iZettle have achieved an average of 17% uplift in registration conversions across all our European markets by running our A/B tests based on the results of multiple strands of research.
Guess Work vs. Best Bet
There are two ways to set an A/B test hypothesis:
- You test something that you have read in a case study (eg. the classy red button vs. green button test) or that you may have heard at a conference, or simply that you think is interesting/logical to test.
- You test something that fixes an issue – based on your data – for your users/customers.
I encourage you to go with option B. Fixing poor user experience is not only the most helpful thing you can do for your (potential) customers, but it’s also your best bet to increase your conversion rate. To identify these friction points on your website, you should run extensive research into user behavior.
The iZettle case study
This was the homepage for iZettle ever since our first product was launched, which performed really well for five beautiful years. However, in the second half of 2016 we had to break the status quo. The reason: this homepage was focusing only on one product (the card reader), and the company started to expand its services with other financial solutions. We therefore needed a new starting page that clearly communicated our new multi-product solutions.
The design team quickly put together a test version of the site based on the new communication direction.
We called this new version the “one headline version”. As you can see, it was more general and more inspirational than descriptive. We tested this variation of the site with Optimizely – for a tiny percent (5%) of our website visitors. We ran several types of research to understand in depth how our audience reacted to the old and new versions of the website.
Here are the most important research methods we used during this project:
- Usability testing
- Data analysis in Google Analytics
- Exit-intent surveys
- Data analysis in our own SQL database
- Five-second testing
Firstly we ran usability tests; which is basically inviting users to sit down in front of a computer and asking them to go through your website. You can do this based on a scenario and you can also give them specific tasks. As simple as it sounds, it is very insightful. If you want to learn more about Usability Testing, read this article about Usability Testing for Data Analysts.
I’m a Data Analyst and not a UX researcher, but I still like to be part of these tests (or at least see the recordings), because it helps me to:
- get an initial hunch for my data analyses (looking at the infinite data without any reference point can be sometimes very intimidating)
- understand a bit more about why users behave the way they behave (in data you can only see what they do, but nothing about why they are doing it)
- just in general meet with the users, empathize with them and learn how they interpret things from our web communication.
Besides the many small things, the top take-away we learned from our usability tests is that our potential customers need more specific information from the very first moment they land on izettle.com. In this initial new “one headline” version they did not understand immediately what the products we offer are and what problems they fix. Even if they can get the information later from scrolling through the whole homepage – we all agreed that their first impression must be much better.
Data Analysis in Google Analytics
Once we had our hypothesis (“visitors need clear and immediate information about our products/services on the homepage”), it was worth investigating whether the quantitative data validated this or not. The easiest way for us to get this information was from Google Analytics.
Let me highlight some of the findings:
- From those customers who started to fill in the registration form, those who actually finished it consumed roughly 1.8x more content initially, than those who didn’t finish the registration form.
- A significant amount of the users, who went to our registration form immediately after the homepage, turned back from there – and were looking for more information.
- 94% of the registrants checked at least our product details page or the pricing page before registration.
- 70% of the registrants checked at least one of the support articles that described the products in depth before registration.
Sounds nice and simple. However the Google Analytics research phase actually took around 2 weeks. We went through many, many different reports, metrics, subsegments and cohorts. We also spent a significant amount of time thinking about what these results mean – as we wanted to avoid a possible confirmation bias, or any other statistical bias types one can have during a data research project.
The findings pointed to the same direction as the usability tests before – and we found nothing, that would prove the opposite. It’s worth keeping in mind however, that correlation does not imply causation. As Frank Kane – Udemy instructor – said:
“You can’t say anything about causality without an experiment, but correlation can tell you what experiments you might wanna run.”
Another quick and easy quantitative research type is website heatmapping. We already had some idea from Google Analytics, as to what our most visited pages are, so it made sense to run heatmaps to find out how users are interacting with them.
An interesting finding for us was that on our new “one headline” start page ~50% of our visitors were not scrolling at all – even though there was plenty of important information below the fold, that could have helped them to understand our service before they bounced.
Why? We didn’t know surely, but our guess was that they didn’t realize they could scroll to see the additional information.
Data Analysis in our own SQL database
Another layer of our analysis was to check the long term goals. At iZettle, it’s important that our users are not just registered, but actual active and returning customers. We wanted to understand if information consumption on our website drives long term retention or not. Luckily, we found that information consumption (before registration) correlates with our long term goals (e.g. activation, retention) too.
We sat together with the creative team and discussed the major findings:
- Our visitors need more concrete information immediately about our products, when they land on our homepage.
- After this first impression, we should make it clear where they can find more information (again: they may not necessarily realize that they can scroll).
- On the other hand we can’t overload our audience with information and too many minor details.
So we had to find the perfect balance, providing the right amount of information at every step to our users. Not too much, not too little.
Again: our first priority was to fix the homepage. Some of the versions of the above the fold section that were created are shown below:
Even though the designers were creating these independently of each other, there were two common solutions that showed in each variation: visually encouraging visitors to scroll and bringing more concrete info above the fold.
Five second testing
All in all, there were 12 different new designs on the table. Now the problem was that we didn’t want to implement (i.e. design + code) each version, because that would slow us down in terms of A/B testing and also it would have taken up a lot of engineering time.
Thus we ran five second tests on the variations – and filtered out those that were underperforming for sure.
Five second testing is a qualitative testing method. We showed 20-50 test-users (those who are in our target-group, but haven’t heard about iZettle yet) a homepage screenshot for five seconds only. The image was then removed and the testers were asked different questions about what they had seen — or think they have seen.
A quick example of 5-second testing:
We had three questions for our testers:
- What service do you think this company sells?
- Did the brand appear trustworthy?
- Which element did you focus on most?
There were some clear winners and losers. Based on the feedback, we eventually chose to build and A/B test Variation 4.
On to A/B testing
The moment of truth has come.
Both data research and qualitative research have their own limitations. Researching your historical data will always lead to correlation vs. causation questions – and qualitative tests might be biased in several other ways (not mentioning the sample size issues).
That’s why I like to say, that if you do your research well enough before your A/B test, you will have a best bet. It is clearly better than a guess, but it’s not enough. To prove your hypothesis you must A/B test your stuff.
So we did.
In Optimizely, we set up a split test with 3 variations (each for 33% of our audience):
We evaluated 3 metrics:
- Conversion Rate and Number of Registrations
- Conversion Rate and Number of Card Reader Sales
Our main KPI was Registration, as it was an important and immediate number to look at. Engagement is kind of a vanity metric, while Card Reader Sales is lagging somewhat in our case.
And after a few weeks the results were:
- Version A and Version B performed the same
- Version C performed ~17% better in terms of Registrations (significance level was 99%+)
- Version C performed ~13% better in terms of Engagement (significance level 99%+)
Note: Later we followed up with the lagging Card Reader Sales numbers and that supported our hypothesis as well.
An important note: we know that it’s risky to change more than one variable during an A/B test. However, this situation was special due to the one-product to multi-product transition—which meant we needed to change many elements at one time. Our next steps have been to test and optimize the small details one by one (eg. hero pictures, wording, etc.).
Is 17% uplift a great result? Although, it doesn’t sound as outstanding as some case studies with 200% or even 300% uplifts, lifting a business metric like Registrations by that percentage is a significant improvement in every sense.
But if you have one key takeaway from this article, it should be this: always run multiple types of research before conducting your A/B tests to increase your chances for success!