Reduce risk, fail faster: Experimenting across the product development lifecycle

Learn how you can add experimentation to your product development process to help you build the right products.

Product teams are constantly under pressure to optimize user experiences, resource constraints and time-sensitive commitments. Adding extra steps to the whole process seem counter productive. 

However, going fast without checking the direction can easily take you off-course. In this recording, Optimizely's Senior Product Strategy Director David Carlile provides practical insights on integrating experimentation across the entire delivery cycle, so that you can focus on getting to the right place, faster.

 He talks about:

  • How experimentation aligns with all stages of product development
  • How to ensure you're building the right products
  • How to measure continuous optimization

This session was recorded at #mtpcon London 2023.

Transcript

Hello, everyone. My name is David Carlisle, and I'm so excited to be talking to you today.

I'm really excited to be giving a session, walking through how to reduce risk and fail faster and talking about, building experimentation into your product development life cycle. And I think this is a really, really amazing topic. I have a lot of experience doing this, and I'm excited to share with you, some of the experience and learnings that I've had over my career So without further ado, let's jump in.

Introduction

Just want to do a quick introduction to myself here. My name is David Carlisle. I am senior director of product strategy at Optimizely, and I focus on our experimentation platform, specifically. So what that means is I'm responsible for partnering with, product and engineering to define, our roadmap and the features that we build to, incorporate the learnings from the voice of the customer and from our customer advisors into the product road map to incorporate product feedback into the project road map and to, stay on top of industry trends and understand how we can push our experimentation platforms, and innovate into the future.
Most of my experience has been, either customer facing through account management, or even, as a practitioner, I was also building and running experimentation programs for a number of years. And so A lot of my experience is based on using the platform as well as working with some really world class amazing brands and helping them scale their experimentation program. And so, some of the things I'm going to share with you today are really based on, secrets that I have been able to steal from really amazing experimentation and product development teams. And so I'm excited to share them with you today.

Agenda overview

Just to kick us off with an agenda here, I'm going to talk a little bit about how I see the product development life cycle and how that's quite similar to an experimentation program and experimentation life cycle.

We're going to talk about how you would use experimentation throughout that product development lifecycle to determine the right direction to determine the right features to build to determine the right customer experiences to create.

And then, lastly, we're going to talk about measuring your productivity and your optimization here and measuring the value of these things.

So this is, this is sort of the agenda for today.

The product development process

Let's start off with product development and experimentation here. Let me walk you through this process.

So the product development process, these are some core items that we go through here. The first one being discover. The questions that we are trying to ask when we're in the discover part of product development process is which features should be added to our roadmap and why. So as we're going through the discovery and we're trying to define that product development here, we're trying to determine which feature should be added to our roadmap and why. And as we go through this process and we determine these things, We ultimately prioritize some, some of these features, and then we move to the next step of the process, which is design.

And so design is really all about what should these features look like and how should they be built. And so once you've done some discovery and you've determined which features you'd like to be on your roadmap.

And then you've moved into this process of designing what you think they should look like and perhaps how they should be built in in partnership with the right teams that do this, then you sort of move into this build phase where you then undertake the task of building maybe some minimum viable pieces of this feature or sort of some prototypes or you try and determine what is the most efficient and cost effective way to build this feature.

And then Once you've sort of gone through this discovery, this design, and this build process, and you have sort of a, working prototype or an MGP out there, then you seek to get validation. So then you try and determine, how can we validate that this feature works? Well, we've We put it out in front of some focus groups. We put it out in a beta test, or, we share it internally with some key stakeholders to get feedback, right, and then we try and incorporate some of that feedback.

But ultimately, the next step would be to roll this feature out. And so how can we roll this feature in the most efficient way while reducing risk?

That's sort of the goal of releasing a feature and ruling it out. And then as always, you continue to sort of try and iterate to perfection. So even though we've done a lot of diligence in our discovery, design, build, and validate phases, once you roll things out to production or out to customer facing experiences or whatever your scenario is, you usually get more feedback and more data and more insights on how that's performing. And so you try and iterate and optimize the existing feature.

And so this might not match perfectly to your own product development process, but these are some key steps in most product development processes that I've seen over the years.

Experimentation in product development life cycle

And this is actually quite similar, and tends to how we build and run an experimentation program.

Alright. So in experimentation, we go through some discovery, right, And, using experimentation in the discover phase is a really great way to test and learn about what you should build. And we have, this notion of a painted door test or a test to validate the demand. And this is how you would use testing in the discover step of your product development life cycle. And I'm going to get into some actual examples in a minute but let's just continue to work down the road here.

Once you've done some tests to learn type scenarios where you've put out some AB tests or some peanut door tests to try and validate the demand and go through your discovery, then you would move to your design phase. And testing in the design phase is really more about testing to decide how something should look.

You're trying to make a decision or you're testing to decide. And you can do this through user studies or rapid prototyping, right, to try and validate that feature design. And experiments here are more to get, quick feedback on the user experience so that you can move on to actually working on the next step here, which is to build to actually build that product or build that feature.

And as you move to this phase, this is where we encourage our customers to use feature flags right, to limit the sort of blast radius when they're building here. And what that means is, you wrap your code releases or you wrap your features in a flag that you can toggle on or off, and you can target to specific audiences or cohorts or devices or whatever your strategy is. And wrapping things in feature flags gives you this configurability and this, ability to target and control the amount of people who might see that feature, right, and, so that limit the blast radius.

And so once we've gone through this build process and we've diligently wrapped things in feature flags, then perhaps we want to actually roll out a test here and test to measure. So this is, all about feature tests and trying to prove behavioral change in business impact through your experiment, through your feature that you're trying to release.

This is a test to measure its impact it's a test to measure if this change, if this feature actually improves customer behaviors.

And so This is a great way to test and validate sort of your work to this point.

Once we've, done some great testing learn about the demand in the discovery phase, and we've done some great testing early on in the design phase to get feedback and user experience.

And then once we've sort of built this feature and, validated it through and actually be tested and measure its effectiveness, Then we want to go ahead and roll this entire experience out to production in the safest way possible, right, through a staged rollout to make sure that we can monitor the quality and the performance of this experience when it's rolled to production. And again, this is another amazing way to use feature flags and the Optimizely Feature Experimentation platform to really control that for that that rollout or that, productionization of what you've built to make sure that the way it performs is similar to the previous experiments, the previous tests that you've run, and then naturally you're going to continue to test and learn about how you can iterate and improve upon this feature that you build.

So what I've done here is I've taken the core steps of a product development life cycle, and I've tried to express all the key areas that you can use experimentation.

And in just a second, I'm going to walk through some actual customer examples. But before I do, I wanted to point out just some common challenges that you could see as you're going through this. So first and foremost, a common challenge that you might run into is that you ship first, you have the ship first mentality.

Common challenges

You're maybe not asking enough questions You're maybe not doing enough testing in that discover and design phase, and you're moving too fast into the build and validate phases, and you're getting features out that don't have enough feedback or they don't have they're not asking enough questions before you invest the time to build them. 

You have this shipped first mentality.

And so, what we want to do is use experimentation, use, testing to sort of validate our assumptions before we invest the resources to build and ship products. And that's the that's a key point that I want to instill here is that if you're only testing after you're building and shipping products, well, then you're leaving money and you're you're leaving customer experience improvement on the table, and you're you're not, you're not getting the most out of your software here. And we really think you should move your testing or at least some of your testing and validation ahead of, building and shipping features, at least enough that you can try and validate the demand.

And get some initial feedback before you invest the resources to build and ship. Al So our common challenge is this ship first mentality. You're not asking enough questions, and you're really not sure if you're shipping the right thing.

Another common challenge is that this can often be perceived as productivity threat. It's like, when pushing changes becomes the goal versus pushing only the right changes.

And so oftentimes when you're trying to build a case for being more test and learn mindset for being more iterative in your approach to building product, it can be seen as a productivity threat. But the the the real threat here is that you're not validating the demand and you're investing the resources to build something where there's no demand. And so I encourage you to push back on the productivity threat and make sure that you're continuing to to kind of move towards validating the demand and the experience of features before investing all the resources. To build these experiences.

Another common challenge is is really getting organizationally wide buy in. This is difficult to do. perhaps there's already, there's already an an idea of something that we want to build. It's already sort of connected to our long term strategy, and, we know we're going to build this thing. For example, we know we're going to redesign our checkout experience or we know we're going to redesign our internal search experience, right, or we know we're going to redesign the way customers log into our site, whatever your scenario is. And sometimes these initiatives are set at the HIPPO level or at the executive level, and they're really tough.

They're really tough to change the direction once the once things are kind of built into the roadmap and built into the strategy for a given year. And so getting this organizational wide buy in is is sometimes difficult. And, sometimes trying to bring testing and validation and mindset into this can can conflict with these sort of, with with these agenda items.

So this is a common challenge. And, and, lastly, you gotta be careful not to let a experimentation be perceived as validation only as sort of the last mile in the process as the last step before we ship something out.

And so I I don't want it to be seen as a necessary burden or a necessary evil to everything that you build. In fact, hopefully through the last slide, and some of the customer examples I show you today, you will see the value in pulling experimentation earlier into your product development life cycle so that you can help prove that you're building the right thing. So with that, let's actually move on to talking about how we determine the right direction and how we build the right thing. Specifically, how do we build the right thing?

Testing and validation

And, we believe that the way you build the right thing is to test your way into the right thing is to experiment your way in to the right thing. So I want to start here by sort of anchoring the conversation with three product test types that we have here.

There's the test to learn This is an experiment or a test that's designed to learn about the demand and the interest for features and to learn how users respond to your feature. And why they respond the way they do. It's an experiment designed to learn about something. There's also a test to decide You're testing to decide on feature design on the way we build something or the way we roll something out, which is similar to testing to learn because you're you're always trying to learn something through a test, but it's focused a bit and it's not just trying to get feedback and make quick improvements.

It's more trying to decide, how we spend our resources in how we roll this thing out and how we productionize something. And then sort of once we've productionized something, then perhaps we test the measure. So we try and validate the impact in production.

We try and validate the amount of benefit to the customer experience. So we try and validate key business metrics and performance indicators. We test to measure the impact here.

So these are three core product test types, or organizational test types. And I want to share with you some actual customer examples of walking through each of these.

Customer example: Medium

And the first customer I'd like to use today is, medium medium is medium dot com, and the medium of the app is sort of a it's a it's a tech news and blog and research and learning and article type based platform where you can go and, you can you can do research, and you can subscribe to ideas, and you you can kind of, read news and articles and stay up to date on things. Anyway, medium has a product.

They have an, iOS and Android application. They manage this, and they're, like you, they're constantly trying to decide what is the right direction and what we should build. And so, let's say there's a product manager at medium that says, hey, I'd really like to add the ability for customers or users of our apps to listen to these articles as well as read them.

So They have this great idea. They want to put this, this listen button or this listen feature in their in their product. And so someone says, well, how many people even want to do that? How do we validate the demand for this feature?

So a way to validate that demand is through something that we call a painted door test. So medium can put this listen button in the product But when you click on the listen button, it's just a pop up model here that says, hey, please check back soon.

We're building this feature. And this is not the kind of test that you run for statistical significance. It's not the kind of test that you run, to try and hit stat sig or improve metrics.

This is the kind of test that you run for it a day or a couple of days, and you try and determine what percent of visits interact with this, what percent of interest is here, And if three percent of traffic or users or visits interact with this feature, Well, maybe you don't need to prioritize building it. But, if thirty percent of users or thirty percent of visits interact with this feature, Well, perhaps you should consider prioritizing building these products on your product's roadmap. This is a great way to validate the demand for a feature before you invest a lot of resources.

It's much easier to put up a button and a pop up model than it is to actually build in the capability to listen to articles on the site. And so in this case, There was enough demand here. There was interest in this feature, and that actually led medium to make an acquisition in the space of a company called knowable.

Right, and they folded knowables technology into medium. And now if you use the medium app or website, you can listen to these articles as well as Deborah. So this is a fantastic example of using experimentation to validate the demand for a feature, right, This is a great example of testing to learn, right, where it's a painted door test, it's exploratory in nature. It's in these early phase.

It's in the discovery phase, right, and it validates the demand before we invest the resources.

So great eye example of testing to learn. And once we've validated the demand for a feature, once we've sort of tested to learn, and we've learned that there is a demand for something, then we move to the design and the build phases of that product development life cycle I showed you earlier.

Customer example: Venmo

So let's move on to an example of using experimentation in, to roll features out and and to move into that design and, build cycles. So in this case, We've used some testing to validate our demand, and now we want to use feature flags and feature management to roll out that experience after we've built it. 

So This customer example is, on Venmo. Venmo is, sort of the, payments and currency trading and currency holding, application. And so early on, in twenty twenty one, in April of twenty twenty one, Venmo wanted to add support for buying, holding, and selling cryptocurrency within their mobile apps.

Feature rollout using experimentation

And so in this case, they had done some testing to validate the demand for that feature. They then moved into actually building this feature. And so they worked using the Optimizely feature flagging capabilities to build this feature and put it behind feature flags and push that live out into their applications behind a flag that was turned off So now this feature is is out there in the app, but it's not visible to users yet.

And then you you sort of let it you you sort of let the the feature sort of propagate to all users because users then have to go sort of update the version of their app and all this stuff. Sometimes it could take a a week or a few weeks to get through the app store and for enough users to have their phones sort of grabbed the update. So now that your feature is live out there, right, and then when you decide, or in this case, when Venmo decided it was time, They sent an email out.

They changed the homepage on their website. And simultaneously, they toggled on the feature flag for the feature to be live in mobile iOS Android applications.

And so this is a great example of the power of feature flagging because it gives you that configurability and the ability to coordinate across digital marketing, product and engineering teams that have a unified feature release feature rollout. And this is a great example of feature flagging of rollouts and this notion of decoupling a code release from a code deploy.

So, Venmo deployed that feature into the code base weeks before they toggle the feature flag on. So they're not It's not coupled with code release code deploy.

It's decoupled, and that configurability is really powerful. And so this is a great example. Of where you'd run a test to decide, to decide the right experience, to decide the right audience, to decide the right time to expose a feature.

So once you've done some testing to validate the demand for a feature, and then you move into your build and release phases you're testing to decide, sort of the last mile is then testing to measure.

And I think oftentimes, this is where customers get started. But we believe this, we believe that customers who are trying to build the right thing should be pulling experimentation much earlier on in their product development life cycle, right, and not have it be the last mile or sort of the rubber stamp on something that's already been built. And so the customer that I chose to use as an example here is Salesforce.

Customer example: Salesforce

Salesforce is, is a very large software development that makes a number of solutions around CRM and CDP and sort of helping you manage your customer database and the sales life cycle. I'm sure you're familiar. And so in this example, Salesforce has a fairly large conference they throw every year called Dreamforce. And the goal here as when you're testing to measure is is, to try and improve the DreamForce Landing page to get as many sign up test possible to Dreamforce before the conference. So this is a time based promotion.

We have to drive, sign up, as much as we can before the conference starts. And so this is a great scenario where you can use things like personalization, where you can use things like multi armed bandits and sort of our auto allocation capabilities to try and get the most ROI or the most interaction out of an experience.

So a traditional AB test may just have a and b and hold those constant over time, where if you use some of this personalization or some of this bandit or auto allocation algorithms, you can actually generate a lot more interaction and a lot more ROI on your experiences. And so this is often where people think testing lives.

It's in the content, it's in the user experience, and it's really just squeezing out every last bit of ROI we can, but the reality is This type of testing is sort of the last mile after you've validated demand and built in feature flags and released it, then you go on to use some of these really great optimization capabilities that we have to squeeze out the rest of the value and to continue to iterate and optimize. And so, hopefully, what you've seen over the course of the last three customer examples, right, is a really great, end to end omnichannel sort of multi team approach to test.

End-to-end approach to testing

So when you're testing to learn, you're in this rapid prototyping and validation phase. And that is generally product and design.

Ultimately, after you go through that prototyping and you get some feedback from your tests, then you move into sort of working with engineering teams to build these experiences and release them. And that's where you run experiments try and decide. That's where you're testing to decide. And it's a great area to use feature flagging and, to decouple your code release from your code deploys, really giving you that configurability.

And then once you've sort of released those features, Then ultimately marketing works on the messaging and the content surrounding those features or those experiences, right, and this is where you start testing to measure. You measure your your message validation, you measure your personalization, you measure the impact, the quality of your experience. And so, hopefully, what you've seen, right, is a really great, like I said, end to end example of how you can use tests at every step of the way across multiple teams and disciplines to try and understand how to build the right thing, how to move in the right direction.

Continuous optimization and measurement

And so as you go through this and as you start implementing some of these practices, The last thing I want to talk about today is how you sort of measure continuous optimization. How do you measure the impact that you're having and how do you know you're sort of on the right path.

So, you want to continue to validate, iterate, and repeat These steps, you want to use testing to validate, deploy. You want to iterate on that process, and then you want to repeat again. So it becomes this amazing.

Feedback loop, right, this cycle of feedback that helps influence where you go, but how do we measure this? Well, just to kind of recap We've talked about testing to learn in the product discovery phase, testing to decide in the engineering build phase, and testing to measure in the market marketing and content delivery phase, right, just to kind of recap you here. And so what I want to do is share maybe some validating program performance metrics that you should keep your eye on as you're building testing into your product development life cycle. So the first set of metrics I want to talk about is these program metrics. These are things like, test velocity.

Program and value metrics

Conclusive rate, win rate, lose rate, roll back, roll out rate, these are great examples of program level metrics that are designed to measure the the effectiveness of your testing and experimentation program, right, and all of these metrics are important to monitor for many reasons. we'll take the bottom one test duration, for example, like, on average, how long are your experiments running? This has some major impacts to how quickly you can iterate and how quickly you can spin up new experiments or productionalize experiences. And so this is a really important metric to measure.

And these program metrics are important, but I don't want them to be confused with value metrics. On the right side of this. And so value metrics are things like, do we have a positive impact?

And how much do we lift that metric? Do we have a negative impact in how much did we decrease that metric? And these are not, polar opposites. if you have a positive statistically significant experiment, then that's easy to calculate that ROI.

You just annualize the traffic that would have had that improved conversion rate and you have a pretty good estimate of how much potential positive impact you have. But even if you had a negative statistically significant experiment, Well, that actually would probably cause you not to roll out that experience. And so you actually saved your organization money demand interactions because through a test, you prove that an experience was negative and you didn't roll it out. And so most of our most mature customers factor in the amount of, KPI that they've save the organization through statistically significant negative experiments.

The impact of negative experiments

And this is a really important factor when you're calculating your your annual estimated impact or your ROI, your sort of return on your investment with Optimizely, is not to forget that, stat sig negative experiments are just as impactful and powerful in decision making as positive experiments are. So, let's wrap up here. Let's just kinda finish with some key takeaways, right, your program metrics should evolve gradually over time.

Evolving program metrics over time

I know everyone wants to start with lots of test velocity and with high conclusive rates and with, high roll out rates But ultimately, you have to start somewhere and you have to start building towards that velocity. You have to start improving your program so that you can increase your of rates.

So they have to start somewhere, and they have to evolve over time according to the needs of your business.

Program metrics can be harder to collect in nature. It could take extra work to track the amount of experiments that didn't actually will get rolled out or to track the amount of test duration. These may take additional work to capture, but they are really really important, especially as you go to, improve the maturity of your program. Oftentimes, when customers are getting started, they let their testing program be a little bit fast and loose so that we can gain some adoption and we can, gain some velocity And then as we get more established, we begin to weave in best practices and governance, right, so that we can really make sure we're operating per best practices.

And I think, like, that's where it's really important to have some of those program metrics. And those program metrics can serve as that directional compass for the cultural change that's happening in your organization.

The value of experimentation

They can help establish baselines and, give you sort of time to understand the progress that you're making on your journey towards being more test and learn, and more iterative in your mindset building products. And lastly, experimentation value includes achieved gains and avoided losses.

So experimentation should lead to improvements and should also help you catch potential mistakes. And all of this is in service of making the right decision about what to build. That's best for your customers or for your users or for your use case.

So I'll leave you with this slide is, you want to build the right thing. Don't just build it right. And so we believe that embedding experimentation along your product development process will help you do