Skip to content
Learn Agentic AI12 min read0 views

Feature Flags for AI Agents: Gradual Rollout of New Agent Behaviors

Learn how to implement feature flag patterns for AI agents including percentage-based rollouts, user targeting, and kill switches. A practical guide to safely shipping new agent behaviors to production.

Why Feature Flags Matter for AI Agents

Deploying a new agent behavior to every user at once is a high-risk move. A subtle prompt regression, a newly enabled tool that hallucinates, or a model upgrade that changes response tone can all degrade user experience before you even notice. Feature flags solve this by letting you control exactly who sees which version of an agent behavior — and instantly revert if something goes wrong.

Unlike traditional software where a bug produces a deterministic failure, AI agent issues are probabilistic. A prompt change might work well for 95% of queries but catastrophically fail on edge cases. Gradual rollout gives you the observation window to catch these statistical regressions before they become widespread.

Core Feature Flag Architecture

A feature flag system for AI agents needs three components: a flag store, an evaluation engine, and an integration layer that the agent runtime consults at decision points.

from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
import hashlib
import time


class FlagStatus(Enum):
    OFF = "off"
    PERCENTAGE = "percentage"
    TARGETED = "targeted"
    ON = "on"


@dataclass
class FeatureFlag:
    name: str
    status: FlagStatus
    percentage: float = 0.0
    targeted_users: list[str] = field(default_factory=list)
    targeted_plans: list[str] = field(default_factory=list)
    kill_switch: bool = False
    created_at: float = field(default_factory=time.time)
    description: str = ""

    def is_enabled(self, user_id: str, plan: str = "free") -> bool:
        if self.kill_switch:
            return False
        if self.status == FlagStatus.OFF:
            return False
        if self.status == FlagStatus.ON:
            return True
        if self.status == FlagStatus.TARGETED:
            return (
                user_id in self.targeted_users
                or plan in self.targeted_plans
            )
        if self.status == FlagStatus.PERCENTAGE:
            return self._hash_percentage(user_id) < self.percentage
        return False

    def _hash_percentage(self, user_id: str) -> float:
        hash_input = f"{self.name}:{user_id}"
        hash_val = hashlib.sha256(hash_input.encode()).hexdigest()[:8]
        return int(hash_val, 16) / 0xFFFFFFFF * 100

The _hash_percentage method is critical. It uses a deterministic hash so the same user always gets the same result for a given flag. This prevents the jarring experience of a feature appearing and disappearing between requests.

The Flag Store

In production you would use Redis or a dedicated feature flag service, but a JSON-backed store illustrates the pattern cleanly.

import json
from pathlib import Path
from threading import Lock


class FlagStore:
    def __init__(self, config_path: str = "flags.json"):
        self._path = Path(config_path)
        self._cache: dict[str, FeatureFlag] = {}
        self._lock = Lock()
        self._load()

    def _load(self):
        if self._path.exists():
            raw = json.loads(self._path.read_text())
            with self._lock:
                self._cache = {
                    name: FeatureFlag(
                        name=name,
                        status=FlagStatus(data["status"]),
                        percentage=data.get("percentage", 0.0),
                        targeted_users=data.get("targeted_users", []),
                        targeted_plans=data.get("targeted_plans", []),
                        kill_switch=data.get("kill_switch", False),
                        description=data.get("description", ""),
                    )
                    for name, data in raw.items()
                }

    def evaluate(self, flag_name: str, user_id: str, plan: str = "free") -> bool:
        with self._lock:
            flag = self._cache.get(flag_name)
        if flag is None:
            return False
        return flag.is_enabled(user_id, plan)

    def reload(self):
        self._load()

Integrating Flags with the Agent Runtime

The flag store is consulted at key decision points inside the agent: which system prompt to use, which tools to enable, or which model to call.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

flag_store = FlagStore("flags.json")


def build_agent_config(user_id: str, plan: str) -> dict:
    config = {
        "model": "gpt-4o",
        "system_prompt": "You are a helpful assistant.",
        "tools": ["search", "calculator"],
    }

    if flag_store.evaluate("new_reasoning_prompt", user_id, plan):
        config["system_prompt"] = (
            "You are a helpful assistant. Think step by step "
            "before answering. Show your reasoning."
        )

    if flag_store.evaluate("enable_code_interpreter", user_id, plan):
        config["tools"].append("code_interpreter")

    if flag_store.evaluate("use_gpt4o_mini", user_id, plan):
        config["model"] = "gpt-4o-mini"

    return config

Kill Switch Implementation

A kill switch is the most important safety mechanism. When activated, it immediately disables a feature for all users regardless of other targeting rules.

class KillSwitchManager:
    def __init__(self, store: FlagStore):
        self._store = store

    def activate(self, flag_name: str, reason: str):
        flag = self._store._cache.get(flag_name)
        if flag:
            flag.kill_switch = True
            self._log_event(flag_name, "KILL_SWITCH_ON", reason)

    def deactivate(self, flag_name: str, reason: str):
        flag = self._store._cache.get(flag_name)
        if flag:
            flag.kill_switch = False
            self._log_event(flag_name, "KILL_SWITCH_OFF", reason)

    def _log_event(self, flag: str, action: str, reason: str):
        print(f"[ALERT] {action}: {flag} — {reason}")

Wire the kill switch to your monitoring alerts. If error rates spike after a rollout, a single API call can revert the behavior globally.

Percentage Rollout Strategy

A safe rollout typically follows this progression: 1% for internal testing, 5% for canary, 25% for early adopters, 50% to confirm at scale, then 100%. At each stage, monitor error rates, latency, and user satisfaction before proceeding.

ROLLOUT_STAGES = [1, 5, 25, 50, 100]


def advance_rollout(flag: FeatureFlag, current_stage_idx: int) -> FeatureFlag:
    next_idx = min(current_stage_idx + 1, len(ROLLOUT_STAGES) - 1)
    flag.percentage = ROLLOUT_STAGES[next_idx]
    flag.status = FlagStatus.PERCENTAGE if flag.percentage < 100 else FlagStatus.ON
    return flag

FAQ

How is percentage rollout different from random sampling?

Percentage rollout uses deterministic hashing so each user consistently sees the same variant. Random sampling would flip behavior between requests for the same user, creating a confusing experience. The hash ensures stability while still distributing users evenly across the rollout percentage.

When should I use a kill switch versus just setting the percentage to zero?

A kill switch is a separate override that bypasses all other logic. Setting percentage to zero still requires the flag status to be in percentage mode. Kill switches are faster to activate in an emergency because they work regardless of the flag's current configuration state.

Can I combine percentage rollout with user targeting?

Yes, but keep the evaluation order clear. A common pattern is to check targeted users first, then fall back to percentage-based evaluation. This lets you guarantee specific accounts always see the new behavior while gradually expanding to the general population.


#FeatureFlags #AIAgents #GradualRollout #ProductionSafety #Python #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.