Best Practices for Feature Flag Testing and QA
Ensuring your software works before your users try it out is paramount to building trustworthy solutions. A common way for engineering teams to ensure their software is running as expected is to have automated unit, integration, and end-to-end tests along with manual quality assurance. Feature flags are extremely useful tools to ship faster, more confidently.
Ensuring your software works before your users try it out is paramount to building trustworthy solutions. A common way for engineering teams to ensure their software is running as expected is to have automated unit, integration, and end-to-end tests along with manual quality assurance.
Feature flags are extremely useful tools to ship faster, more confidently. But since feature flags add branching experiences into your codebase, you’ll wonder, how does my automated testing strategy interact with feature flags?
Having a strategy to test every combination is not going to be sustainable. As an example, let’s say you have an application with 10 features and 10 corresponding automated tests. If you add just 8 on/off feature flags, you theoretically now have 2^8 = 256 possible additional states, which is nearly 25 times as many tests as you started with.
As testing every possible combination is nearly impossible, what can you do instead? You’ll want to get the most value by focusing on specific automated tests and techniques. I’ll go through the different levels of automated testing from the “testing pyramid,” as described in my free ebook Ship Confidently with Progressive Delivery and Experimentation, so you can see how each level supports feature flags in maintaining high quality. Let’s get started!
01. Unit tests—test frequently for solid building blocks
Best practice: Ensure the building blocks of your application are well tested with lots of unit tests. The smaller units are often unaware of experiment or feature state. For those units that are aware, use mocks and stubs to control this white-box testing environment.
Unit tests are the smallest pieces of testable code. It’s best practice that these units are so small that they are not aware or are not affected by experiments or feature flags. As an example, if a feature flag forks into two separate code paths, each code path should have its own set of independent unit tests. You should frequently test these small units of code to ensure high code coverage, just as you would if you didn’t have any feature flags or experiments in your codebase. If the code you are unit testing does need to contain code that is affected by a feature flag or experiment, take a look at the techniques of mocking and stubbing described in the integration tests section below.
02. Integration Tests—force states to test code
Best practice: Use mocks and stubs to control feature and experiment states. Focus on individual code paths to ensure proper integration and business logic.
For integration tests, you are combining units into higher-level business logic. This is where experiments and feature flags will likely affect the logical flow of the code, and you’ll have to force a particular variation or a state of a feature flag in order to test the code.
In some integration tests, you’ll still have complete access to the code’s executing environment where you can mock out the function calls to external systems or internal SDKs that power your experiments to force particular code paths to execute during your integration tests. For example, if your feature flag is powered by a function isFeatureEnabled from a feature management SDK, you can mock the isFeatureEnabled SDK call to always return true in an integration test. This removes any unpredictability, allowing your tests to run deterministically.
In other integration tests, you may not have access to individual function calls, but you can still stub out API calls to external systems. For example, you can stub data powering the feature flag or experimentation platform to return an experiment in a specific state to force a given code path.
Although you can mock out indeterminism coming from experiments or feature flags at this stage of testing, it’s still best practice for your code and tests to have as little awareness of experiment or feature flag as possible, and focus on the code paths of the variations executing as expected.
03. End-to-end tests—focus testing on critical variations
Best practice: Do not test every possible combination of experiment or feature with end-to-end tests. Instead, focus on important variations or tests that ensure your application still works if all features are on/off.
End-to-end tests are the most expensive tests to write and maintain because they’re often black-box tests that don’t provide good control over their running environment and you may have to rely on external systems. For this reason, avoid relying on end-to-end or fully black-box tests to verify every branch of every experiment or feature flag. This combinatorial explosion of end-to-end tests will slow down your product development. Instead, reserve end-to-end tests for the most business-critical paths of an experiment or feature flag or use them to test the state of your application when most or all of your feature flags are in a given state. For example, you may want one end-to-end test for when all your feature flags are on, and another when all your feature flags are off. The latter test can simulate what would happen if the system powering your feature flags goes down and must degrade gracefully.
When you do require end-to-end tests, make sure you can still control the experiment or feature-flag state to remove indeterminism. For example, in a web application, you may want to have a special test user, a special test cookie, or a special test query parameter that can be used to force a particular variation of an experiment or feature flag. Note that when implementing these special overrides, be sure to make them internal-only so that your users don’t have the same control over their own feature or experiment states.
For an example of how Optimizely does this internally with Cypress.io and Selenium, read this blog post by the Optimizely QA team.
04. Manual verification (QA)—reserve for business-critical functions
Best practice: Save time and resources by reserving manual QA to test the most critical variations. Make sure you provide tools for QA to force feature and experiment states.
Similar to end-to-end tests, manual verification of different variations can be difficult and time consuming, which is why organizations typically have only a few manual QA tests. Reserve manual verification for business-critical functions. And if you implemented special parameters to control the states of experiments or feature flags for end-to-end tests, these same parameters can be used by a QA team to force a variation and verify a particular experience.
Let me know what you think!
This is part of a series of best practices to help your company successfully implement progressive delivery and experimentation to ship faster with confidence.
If you like this content, check out my free e-book: Ship Confidently with Progressive Delivery and Experimentation which offers more best practices from just getting started to scaling these technologies organization-wide.
And if you are looking for a platform to get started, check out Optimizely’s free offering.