Testing for Developers: Correctly using Behavioral Science Design Methodology
When you’re developing products or designing user experiences, wouldn’t it be great to know what works and what doesn’t before you commit resources? Using behavioral science and design methodology in large scale experimentation lets you test for cause and effect so you can translate digital touchpoints into informed decisions. Unfortunately, A/B testing is widely misunderstood
When you’re developing products or designing user experiences, wouldn’t it be great to know what works and what doesn’t before you commit resources? Using behavioral science and design methodology in large scale experimentation lets you test for cause and effect so you can translate digital touchpoints into informed decisions.
Unfortunately, A/B testing is widely misunderstood and misused. While it can be a powerful tool for hypothesis testing, learning, and optimization, using it without a theoretical underpinning and rigorous experimental framework can lead to inaccurate conclusions and wasted marketing dollars.
In this blog post, I will show you how to maximize learning so you can forge ahead with confidence.
Before building your testing framework, generate a theory.
Start with a hypothesis based on human decision making [see behavioral science article for examples of common biases] so you’re not just doing version testing. Starting with a theory lets you discover what works and why. Otherwise, you’ll be doing the “Mad Men style” of tossing out two ideas into the market and seeing which one wins. This old way of testing does not generate accurate conclusions.
Say you create two versions of an app to A/B test. Version one has different copy, images, and a different interface from version two. If version one performs better, how do you know if it was the images, copy, or CTA that drove the performance? Was it all of the variables together? Or just one compelling image? You’d have to do three more tests to find out. On top of that, if you discover it is the copy, but that copy wasn’t grounded in some kind of theory of human decision making, you’d just have a hunch about what it was about that particular paragraph that worked. That is another test you have to run.
Doing things this way also requires an enormous sample size. Being able to measure a change of 15% to 20% requires hundreds, if not thousands, of people to interact with the variable you want to test in order to be confident in your results. To run just the four tests proposed here could require driving hundreds of thousands of people to the site (which may cost money in ad spend) and you may just find out you used a pretty picture.
Instead, it is much more effective to formulate a few hypotheses early on, and do some quantitative and qualitative research to ensure you are confident in them. Then, when there are a few ideas, drop off points, or important variables that are still in question you can run some strategic A/B tests. This way you maximize your learning, budget, and results.
An example of how to test your theories can be found in the work done with one of our recent clients, a powerful trade association for more than 1,000 companies in the construction and agriculture industries, that wanted to translate their trade show’s real-world success into a robust online platform for buyers and sellers.
We spoke with 10 contractors (buyers) and 3 manufacturers (sellers) to understand which of the client’s original ~15 unique value props (UVPs) they prized most. We also wanted to understand how people researched equipment and their level of tech-savviness.
Now we needed to know if people would actually take action on these value props:
- Social Proof: The ability to leave reviews about equipment.
- Relativity: The ability to compare multiple products side by side in a standardized way.
- Talk to Techs and Engineers: Contractors want to be able to talk to the people who know the machines and can honestly tell them what ownership is like. They wanted to talk to techs, not sales or marketing people.
To test this hypothesis, we ran a test with 5 different ads:
- One for each value prop (See above)
- One that combined all 3
- One that was a control
These ads performed remarkably well. Ranked:
- All of our ads outperformed the client’s average click rate.
- Relativity: Outperformed our Control by 146%
- All 3 and Talk to Techs outperformed the Control by 66%
- Social Proof: Outperformed the Control by 35%
- Control: Our Control outperformed the client’s average click rate by 75%
With these unambiguous results, the client now had a clear way forward to radically improving the platform for their users.
When done correctly, controlled experimentation lets you capture accurate, actionable data so you can reduce churn, optimize your R&D, and generate better results. Today, every company can take advantage of digital touchpoints to do large scale testing. Whether you’re developing products or designing user experiences, rigorous experimentation removes the guesswork.