October 23, 2018

How Realtor.com delivered a faster, better customer experience with Experimentation

Creating a faster and more engaging site experience often requires tradeoffs between various revenue streams of the business. Can the increase in conversion from more engaged users more than offset any loss in ad revenue? It’s a bold bet that Realtor.com’s analytics and experimentation team set out to answer at the beginning of 2018.

A faster experience is a better experience

As a digital-first business, Realtor.com depends on providing a user experience that quickly delivers what their customers want: Real estate listings that help people get into the right home for them.

The site provides both real estate listings as well as ads to surface properties and realtors. When concerns about the speed of their website surfaced last year, the team responsible for analytics and optimization began looking into reducing the digital ads on their site as a way to speed up site performance. They were concerned that digital ads can often slow down websites, on both desktop and mobile devices, and slow site speeds can often impact consumer experience and conversion rates.

Without the design constraint of increasingly more ad placements on the website, the team would also gain the flexibility to continue to optimize further, removing barriers to how much they could increase user engagement and conversion rates.

The realtor.com site before the series of tests to reduce ads.

The realtor.com site optimized for customer experience and speed.

The hypothesis: Reducing ads would pay for itself in conversion lift

With the server-side testing available in Optimizely Full Stack, they set out to test their hypothesis: More strategic ad placement would speed up the site and open up opportunities for site design improvements. The resulting benefits of accelerated user growth and conversion rate gains would then more than offset lost ad revenue.

The analytics team setting up this test, Jayakrishnan Vijayaraghavan, Trupti Kankaria, Neha Dhomne, Vijay Taneja and Saurabh Kumar, developed a data-driven hypothesis for the effort. First, they needed to prove the company was not going to lose money from fewer ads. Then, they could set up the team for long-term revenue growth through continuously optimizing the customer experience.

Experiment design challenges

To de-risk their bold idea, the team had to address two major challenges to experiment design:

  1. Measuring user growth impact with a scientifically rigorous method
  2. Defining and controlling revenue impact from the advertising business during the long testing period

Traditional A/B testing methodology works well for measuring conversion rate, engagement, and other operational metrics.

But how do you measure user growth, SEO impact, and word of mouth increases in absolute leads with standard A/B testing methods? With standard randomization, users are split equally between control and test, making the method ineffective for measuring user growth in one variation compared to another. In addition, users in the same geography can easily see different experiences, which makes it impossible to ensure that when a user refers a friend or family member, they see the same experience as the referrer.

User-based randomization could lead to a poor experience for users in a similar geography who see a different version of the site from their friends and family.

The second reason a standard A/B test was not an option for realtor.com, was that their ads are sold by geography. With standard A/B testing, if the ad footprint changes, all advertisers would be impacted. If the team could limit the test to certain geographies, then only the subset of advertisers in those geographies would be impacted, narrowing down the number of contracts with advertisers that would be impacted or need to be re-negotiated during the test period.

Designing a method for geographic-based bucketing with parallel randomization

The analytics team designed a solution to measure the impact of this enormous change on user growth, SEO impact, and absolute leads.

First, they looked at using geo-based clustering to identify groups of states that were roughly equal in population and behaved similarly. But they quickly realized that with traditional clustering techniques, they would end up with groups where the individuals within them behaved similarly, but collective group behavior across clusters would not necessarily be similar.

So they turned to parallelization instead. They designed a process to compute 2 billion parallel randomized combinations of U.S. states, with the goal of developing 3 different buckets of states that behaved similarly in terms of conversion rate over a 52 week period. Understanding that they would never get combinations with the exact same conversion rates, they focused on finding combinations of states that moved together in conversion rate over time, maintaining a constant variance.

Meanwhile, they were operating under a few constraints for these sample groups:

  • Each bucket needed to contain 30-35% of the user base to limit the impact on ad revenue
  • They had to control for effects in seasonality. The series of tests the team was planning to run would occur over at least 6 months, so these groups needed to behave similarly over a long period of time.

With the parallelized process, they were able to get initial results within an hour.

Once Vijay and Trupti had zeroed in on 100 potential combinations that looked best, they ran them through the results from prior A/B tests, asking the question: “Is the lift we see in the geo-based bucketing of past test users the same as what we saw in the user-based randomization model?”

The team picked the state-level grouping that showed the same lift in this back testing over multiple past tests. Once the combination of the states was identified for each group, Gourav Tiwari, software engineering lead for the project, configured Optimizely Full Stack so that users and their activities were tagged appropriately for each group. He also set up the capability to run multiple user-based test variants within the geo-based test group. This helped the team increase the velocity of testing and speed of iterating on ideas.

Using a relative change analysis method to determine impact

As they began running this series of tests, the team needed a stronger analysis method. Given the groups didn’t match on conversion rates but were similar in terms of seasonality (they moved together), they turned to a method of analyzing their results based on a “difference in differences”, or relative change, approach. In this method, they were looking at only the lift on top of the natural divergence or difference between the treatment groups. Using the same method, the team quantified user growth by measuring the difference between percentage contribution of users in the test and control group of states.

The Results: Growth and Conversion Increase

After 6 months of iterating, the team saw a big improvement in user growth and conversion. They convinced management that the effort was paying off, that there was still room to grow even further, and they were able to roll out the new experience more broadly.

VP of Analytics, Vijay Taneja, says that with A/B testing using Optimizely Full Stack, the company is able to validate large initiatives like this, as well as whether they are “making investments in the right place. We are also iterating fast, we’re more confident in our results, and we’re delivering a huge amount of value to the organization, both in terms of monetization and consumer engagement.”