AI Agent Version Management: Deploying and Rolling Back Agent Configurations

The Problem with Ad-Hoc Agent Updates

A product manager asks you to update the support agent's system prompt. You SSH into the server, edit the prompt in a config file, and restart the agent. Two hours later, the support team reports the agent is refusing to answer billing questions. You realize the prompt edit accidentally removed a paragraph about billing access. But you cannot revert because you did not save the previous version.

This scenario plays out constantly in organizations that manage agent configurations informally. AI agents are particularly sensitive to configuration changes because a small wording change in a system prompt can dramatically alter behavior across every conversation.

Versioned Configuration Store

Every configuration change creates a new immutable version. The active version is a pointer, not the data itself. Rolling back means pointing to a previous version, not editing the current one.

from dataclasses import dataclass, field
from datetime import datetime
from uuid import uuid4
import json
import hashlib


@dataclass
class AgentVersion:
    version_id: str = field(default_factory=lambda: str(uuid4()))
    agent_id: str = ""
    version_number: int = 0
    system_prompt: str = ""
    model: str = "gpt-4o"
    temperature: float = 0.7
    max_tokens: int = 4096
    tools: list[str] = field(default_factory=list)
    guardrails: dict = field(default_factory=dict)
    created_by: str = ""
    created_at: str = field(
        default_factory=lambda: datetime.utcnow().isoformat()
    )
    change_description: str = ""
    config_hash: str = ""

    def compute_hash(self) -> str:
        content = json.dumps({
            "system_prompt": self.system_prompt,
            "model": self.model,
            "temperature": self.temperature,
            "max_tokens": self.max_tokens,
            "tools": sorted(self.tools),
            "guardrails": self.guardrails,
        }, sort_keys=True)
        self.config_hash = hashlib.sha256(content.encode()).hexdigest()[:16]
        return self.config_hash


class VersionStore:
    def __init__(self, db_pool):
        self.db = db_pool

    async def create_version(self, config: AgentVersion) -> AgentVersion:
        latest = await self.db.fetchval(
            "SELECT MAX(version_number) FROM agent_versions "
            "WHERE agent_id = $1",
            config.agent_id,
        )
        config.version_number = (latest or 0) + 1
        config.compute_hash()

        await self.db.execute(
            """
            INSERT INTO agent_versions (
                version_id, agent_id, version_number, system_prompt,
                model, temperature, max_tokens, tools, guardrails,
                created_by, created_at, change_description, config_hash
            ) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13)
            """,
            config.version_id, config.agent_id, config.version_number,
            config.system_prompt, config.model, config.temperature,
            config.max_tokens, json.dumps(config.tools),
            json.dumps(config.guardrails), config.created_by,
            config.created_at, config.change_description, config.config_hash,
        )
        return config

    async def get_active_version(self, agent_id: str) -> AgentVersion | None:
        row = await self.db.fetchrow(
            "SELECT av.* FROM agent_versions av "
            "JOIN agent_deployments ad ON av.version_id = ad.version_id "
            "WHERE av.agent_id = $1 AND ad.is_active = true",
            agent_id,
        )
        return AgentVersion(**dict(row)) if row else None

    async def rollback(self, agent_id: str, target_version: int) -> dict:
        version = await self.db.fetchrow(
            "SELECT * FROM agent_versions "
            "WHERE agent_id = $1 AND version_number = $2",
            agent_id, target_version,
        )
        if not version:
            raise ValueError(f"Version {target_version} not found")

        await self.db.execute(
            "UPDATE agent_deployments SET is_active = false "
            "WHERE agent_id = $1",
            agent_id,
        )
        await self.db.execute(
            "INSERT INTO agent_deployments (agent_id, version_id, is_active) "
            "VALUES ($1, $2, true)",
            agent_id, version["version_id"],
        )
        return {
            "agent_id": agent_id,
            "rolled_back_to": target_version,
            "config_hash": version["config_hash"],
        }

Canary Deployments for Agents

A canary deployment routes a small percentage of traffic to the new version while the majority continues using the proven version. If the canary shows degraded quality or increased errors, roll back automatically before users notice.

import random


class CanaryRouter:
    def __init__(self, version_store: VersionStore, metrics_client):
        self.versions = version_store
        self.metrics = metrics_client

    async def resolve_version(
        self, agent_id: str, user_id: str
    ) -> AgentVersion:
        canary = await self.get_canary_deployment(agent_id)
        if not canary:
            return await self.versions.get_active_version(agent_id)

        if self.should_route_to_canary(user_id, canary["traffic_pct"]):
            self.metrics.increment(
                "canary.routed", tags={"agent": agent_id, "version": "canary"}
            )
            return canary["version"]

        return await self.versions.get_active_version(agent_id)

    def should_route_to_canary(self, user_id: str, pct: int) -> bool:
        # Deterministic routing based on user_id for consistency
        hash_val = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
        return (hash_val % 100) < pct

    async def evaluate_canary(self, agent_id: str) -> str:
        canary_metrics = await self.metrics.query(
            f"agent_error_rate{{agent='{agent_id}',version='canary'}}"
        )
        stable_metrics = await self.metrics.query(
            f"agent_error_rate{{agent='{agent_id}',version='stable'}}"
        )
        if canary_metrics["error_rate"] > stable_metrics["error_rate"] * 1.5:
            return "rollback"
        if canary_metrics["error_rate"] <= stable_metrics["error_rate"] * 1.1:
            return "promote"
        return "continue"

Feature Flags for Agent Capabilities

Feature flags let you enable or disable specific agent tools or behaviors for subsets of users without deploying new configuration versions. This is useful for gradual rollouts of new agent capabilities.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Comparing Versions with Diff Views

The admin dashboard should show a diff between any two versions, highlighting changes to the system prompt, model settings, and tool configurations. This helps reviewers understand what changed and why before approving a deployment.

FAQ

How do you test agent configuration changes before deploying?

Run the new configuration against a suite of evaluation prompts in a staging environment. Compare the responses against golden answers using LLM-as-judge scoring. Only promote to canary if the evaluation score meets or exceeds the current production version. Automate this as part of a CI pipeline triggered by configuration commits.

Should system prompts be stored in version control (git) or a database?

Both. Use git as the source of truth for prompt development, code review, and history. Sync approved prompts to the database configuration store on merge. The database serves as the runtime configuration that agents read at startup. This gives you the best of both worlds: collaborative editing with pull requests and fast runtime reads.

How long should you keep old agent versions?

Keep all versions indefinitely. Storage cost is negligible since each version is just a few kilobytes of text and JSON. Old versions are valuable for forensic analysis — if a customer reports an issue from two months ago, you need to know exactly which prompt and model configuration was active at that time.

#EnterpriseAI #VersionControl #Deployment #CanaryReleases #FeatureFlags #Rollback #AgenticAI #LearnAI #AIEngineering

AI Agent Version Management: Deploying and Rolling Back Agent Configurations

The Problem with Ad-Hoc Agent Updates

Versioned Configuration Store

Canary Deployments for Agents

Feature Flags for Agent Capabilities

Comparing Versions with Diff Views

FAQ

How do you test agent configuration changes before deploying?

Should system prompts be stored in version control (git) or a database?

How long should you keep old agent versions?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding