Introducing Optimizely Opal
an all-new AI platform. See how it works
Posted May 13

Flags on flags: How Optimizely uses feature flags

Russell Loube
by Russell Loube
9 min read time

Ever wondered if feature flag vendors use their own products?

At Optimizely, we don't just sell feature flags – we live and breathe them every day.

In fact, we use our Feature Experimentation platform to ship new capabilities to... our Feature Experimentation platform. Yeah, it's meta. We use feature flags to release feature flags. 🤯

This isn't a "nice to have." It's how we build faster, deploy safer, and innovate at scale.

Before diving into our specific use cases, let's address why this matters:

The hard truth about feature releases

Most teams are taking unnecessary risks with their feature releases by:

  • Pushing directly to production and hoping nothing breaks
  • Treating all features with the same deployment process regardless of risk
  • Creating bottlenecks where product decisions require engineering deployments
  • Building complex branching strategies that delay integration

We know because we've been there.

Without feature flags, every release becomes a high-stakes gamble with your UX, engineering time, and revenue on the line.

So, how exactly does Optimizely use our own feature flagging technology?

Here are three real-world examples from our engineering team:

1. Fine-tuning AI features without engineering bottlenecks

Meet Optimizely Opal - your infinite workforce that helps users create better experiments, analyze results, and generate new ideas.

Under the hood, Opal is powered by cutting-edge LLM models from our strategic partners like Google and Microsoft. But here's where it gets interesting.

We use feature flags not just to control whether AI features are ON or OFF, but to fine-tune exactly how they work.

For example, we use feature flags to control:

  • Which LLM model powers Opal (switching between models as needed)
  • How those models are configured (temperature, tokens, parameters)
  • What prompt templates we use
  • Which user segments get access to which AI capabilities

When instrumenting a feature flag like this, one approach is to hardcode the configuration of the LLM for each variation:

const { createInstance } = require("@optimizely/optimizely-sdk"); 
import OpenAI from "openai"; 
 
const optimizely = createInstance({ 
sdkKey: "<YOUR_SDK_KEY>", 
}); 
 
const openAIClient = new OpenAI(); 
 
const user = optimizely.createUserContext("user123"); 
 
const decision = user.decide("llm_flag"); 
 
const variationKey = decision["variationKey"]; 
 
if (variationKey === "control") { 
// Execute code for control variation 
const completion = await openAIClient.chat.completions.create({ 
model: "gpt-4o", 
messages: [ 
{ 
role: "user", 
content: 
"You are a storyteller specializing in whimsical tales. Write a one-sentence bedtime story about a unicorn.", // emphasis on whimsy 
}, 
], 
}); 
} else if (variationKey === "treatment") { 
// Execute code for treatment variation 
const completion = await openAIClient.chat.completions.create({ 
model: "gpt-4o-mini", 
messages: [ 
{ 
role: "user", 
content: 
"You are a children's author. Write a one-sentence bedtime story about a unicorn.", // emphasis on age-appropriateness 
}, 
], 
}); 
}

The problem? Every change requires developer intervention, creating bottlenecks when you need to move fast.

We do it differently.

Old way? Hardcoded configs = slow iteration.

Our way? Dynamic flag variables = total control, zero deploys.

This approach means our product team can change AI behavior without engineering dependencies. We can test different prompts, switch models, and adjust parameters from our Feature Experimentation dashboard with zero code changes.

const { createInstance } = require("@optimizely/optimizely-sdk"); 
import OpenAI from "openai"; 
 
const optimizely = createInstance({ 
sdkKey: "<YOUR_SDK_KEY>", 
}); 
 
const openAIClient = new OpenAI(); 
 
const user = optimizely.createUserContext("user123"); 
 
const decision = user.decide("llm_flag"); 
 
const completion = await openAIClient.chat.completions.create({ 
model: decision.variables.model, // returned by Optimizely decision 
messages: [ 
{ 
role: "user", 
content: decision.variables.prompt, // returned by Optimizely decision 
}, 
], 
});

It led to:

  • Dramatically faster time from idea to production for AI features
  • More experiments run by product teams with the same engineering resources
  • Fewer late-night calls to engineering (bless you, kill switches 💙)

But there's an even more important benefit: Sleep.

Our engineers no longer get paged at night to fix urgent issues with new AI features, because we can disable problematic ones instantly through our feature flag dashboard.

2. De-risking launches with canary testing

Let's talk about launch anxiety.

That feeling when you're about to release a major feature and wait for the support tickets to flood in? That "launch day anxiety?" Yeah, we don't miss it either.

We use feature flags to run canary tests for gradual, data-informed rollouts that turn risky releases into safe, scalable launches.

When we released our Custom Flags Dashboard, a significant change to core functionality, we didn't just push it to everyone at once.

Instead, we:

  • Released to 5% of customers first (mostly smaller accounts)
  • Monitored error rates, performance metrics, and user feedback
  • Gradually increased exposure to 20%, then 50%, then 100%
  • Provided preemptive notices to customers before they received the change

This approach was both cautious and necessary. Since our platform serves as critical infrastructure for our customers' feature releases, any disruption could cascade to their users, too.

When we migrated our entire data pipeline to the Google Cloud Platform (GCP), we used feature flags extensively to manage the transition. But canary testing isn't just for front-end changes. We also used it to roll out infrastructure updates, minimizing risk during deployment gradually.

Using feature flags to control the migration allowed us to:

  1. Risk mitigation: We could instantly revert if performance degraded
  2. Business continuity: Our customers depend on continuous data processing
  3. Validation at scale: We could verify the new systems handled real-world load before full commitment

We completed this massive infrastructure migration with a few customer-reported incidents, despite moving billions of daily events.

The lesson? The bigger the change, the more you need feature flags to de-risk it.

3. Kill switches: Your emergency eject button

Sometimes, stuff breaks (it's software, we get it).

But there's a massive difference between “We’re investigating the issue and will deploy a fix soon…” vs. “We turned it off. It’s handled.”

Kill switches are essential because they transform production incidents from all-hands emergencies into controlled, methodical responses.

At Optimizely, every feature has a kill switch - a feature flag that lets us disable it instantly. No code changes. No deploys. No drama.

Our data file build service ensures flag updates reach end-users in mere seconds, rivaling streaming services without the complicated infrastructure.

When we detect issues through our monitoring systems, we can immediately disable the problematic feature, investigate without pressure, and deploy a proper fix during normal working hours. No more late-night emergency deploys.

This capability is particularly critical for our business for several reasons:

  1. Trust preservation: As a platform that powers our customers' experiences, minimizing downtime is non-negotiable
  2. Global impact: With customers across time zones, there's never a "good time" for an outage
  3. Complex dependencies: Features often interact in unexpected ways, and quick isolation helps diagnosis

However, flags aren't a silver bullet...

Feature flags are not a replacement for a solid rollback strategy.

We use our product to release updates to itself ("flags on flags"). In a worst-case scenario where a buggy feature renders the UI inaccessible, we'd be unable to disable the feature remotely.

That's why our engineering team maintains robust rollback processes alongside feature flags, giving us multiple layers of protection. This layered approach is essential when you're building critical infrastructure that others depend on.

Feature flags as a product strategy

The use cases above - from AI fine-tuning, canary testing, and kill switches - show that feature flags aren't just an engineering tool. They're a product strategy enabler.

By separating deployment from release, we can:

  • Move faster: Push code to production continuously without exposing unfinished features
  • Experiment more: Test ideas with real users before committing to them
  • Reduce risk: Limit exposure and provide instant rollback options
  • Empower product teams: Give non-technical stakeholders control over feature rollouts

For our team, feature flags have fundamentally changed how they think about releases. Instead of high-pressure, all-or-nothing launches, we now view releases as a dial that they can turn up gradually while measuring impact.

Feature flags let us move fast without breaking things. They're our safety net and our secret weapon.
Britt Hall

Britt Hall

Sr. Director, Product Management

Where feature flags are heading next

Looking ahead, we see feature flags evolving in several important ways:

  1. Essential kill switches for autonomous AI: As companies shift from building features that augment user capabilities to features that automate them entirely (like agents that independently act to achieve goals), feature flags become critical kill switches. When your AI is operating with increasing autonomy, the ability to instantly disable problematic behavior becomes a critical requirement.
  2. AI safety through continuous monitoring and adjustment: AI safety is now an active area of research. Especially in B2C contexts, users come from a variety of backgrounds, making it important to monitor AI's potential to reinforce identity-based biases (age, gender, etc.). Feature flags enable teams to fine-tune AI models on the fly whenever concerning patterns emerge.
  3. Managing complexity in AI-assisted development: While AI excels at writing code for new projects, it struggles with understanding the intricacies of established, sprawling codebases. Today, AI is great for tasks like writing unit tests, but as systems grow more complex, tools like feature flags become essential for safely testing, deploying, and iterating, especially in an AI-driven world.
  4. Testing AI models without engineering overhead: As companies develop their own AI features, feature flags allow testing different AI models without additional code changes. Once you implement the flag, you can evaluate which models perform best in real-world conditions—all from your feature experimentation dashboard with zero additional engineering work.

Getting started with your feature flag strategy

If you're not using feature flags yet, or you're only using them for basic on/off switches, here's how to level up:

  • Start with the highest-risk features: Identify where you need kill switches most.
  • Add variables to your flags: Don't just control if a feature is on, but how it works.
  • Define clear ownership: Establish who can toggle which flags and when
  • Build rollout patterns: Create templates for different types of releases (high-risk vs. low-risk)
  • Connect flags to observability: Ensure you can quickly correlate issues identified in APM platforms like Datadogwith flag changes.
  • Establish flag governance: Create processes for creating, reviewing, and retiring flags to avoid technical debt.

The most common objection we hear is about technical debt – "Won't all these flags clutter our codebase?"

While this is a valid concern, we've found there's tremendous value in strategic, long-lived flags. Consider implementing permanent flags in areas you'll continuously experiment on, like your top-of-funnel landing pages or checkout flow. With flag variables, you can control any aspect of these high-traffic areas without code changes.

This approach unlocks "infinite A/B testing," the ability to run iterative experiments without engineering dependencies. Start by testing what works best for everyone, then run targeted tests for high-value customer segments. The result is a continuously optimizing experience that evolves with your business needs, all managed through your feature flag dashboard.

At Optimizely, we've seen firsthand how feature flags transform development from a series of big, stressful launches to a smooth, continuous process of controlled rollouts and data-driven decisions.

Want to see how feature flags could improve your team's development process?

👉 Try Feature Experimentation for Free - Get free feature flags for life!

Or, if you're curious how other companies are flagging smarter, check out our Customer Stories page.

And if you're using flags in a cool way, let's talk. We'd love to hear from you.

  • Feature management, Experimentation
  • Last modified: 5/13/2025 5:20:24 AM