AI Agent Version Management: Deploying and Rolling Back Agent Configurations
Implement version control for AI agent configurations including prompts, model parameters, and tool selections. Learn canary deployment strategies, feature flags for agents, and safe rollback procedures when deployments go wrong.
The Problem with Ad-Hoc Agent Updates
A product manager asks you to update the support agent's system prompt. You SSH into the server, edit the prompt in a config file, and restart the agent. Two hours later, the support team reports the agent is refusing to answer billing questions. You realize the prompt edit accidentally removed a paragraph about billing access. But you cannot revert because you did not save the previous version.
This scenario plays out constantly in organizations that manage agent configurations informally. AI agents are particularly sensitive to configuration changes because a small wording change in a system prompt can dramatically alter behavior across every conversation.
Versioned Configuration Store
Every configuration change creates a new immutable version. The active version is a pointer, not the data itself. Rolling back means pointing to a previous version, not editing the current one.
from dataclasses import dataclass, field
from datetime import datetime
from uuid import uuid4
import json
import hashlib
@dataclass
class AgentVersion:
version_id: str = field(default_factory=lambda: str(uuid4()))
agent_id: str = ""
version_number: int = 0
system_prompt: str = ""
model: str = "gpt-4o"
temperature: float = 0.7
max_tokens: int = 4096
tools: list[str] = field(default_factory=list)
guardrails: dict = field(default_factory=dict)
created_by: str = ""
created_at: str = field(
default_factory=lambda: datetime.utcnow().isoformat()
)
change_description: str = ""
config_hash: str = ""
def compute_hash(self) -> str:
content = json.dumps({
"system_prompt": self.system_prompt,
"model": self.model,
"temperature": self.temperature,
"max_tokens": self.max_tokens,
"tools": sorted(self.tools),
"guardrails": self.guardrails,
}, sort_keys=True)
self.config_hash = hashlib.sha256(content.encode()).hexdigest()[:16]
return self.config_hash
class VersionStore:
def __init__(self, db_pool):
self.db = db_pool
async def create_version(self, config: AgentVersion) -> AgentVersion:
latest = await self.db.fetchval(
"SELECT MAX(version_number) FROM agent_versions "
"WHERE agent_id = $1",
config.agent_id,
)
config.version_number = (latest or 0) + 1
config.compute_hash()
await self.db.execute(
"""
INSERT INTO agent_versions (
version_id, agent_id, version_number, system_prompt,
model, temperature, max_tokens, tools, guardrails,
created_by, created_at, change_description, config_hash
) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13)
""",
config.version_id, config.agent_id, config.version_number,
config.system_prompt, config.model, config.temperature,
config.max_tokens, json.dumps(config.tools),
json.dumps(config.guardrails), config.created_by,
config.created_at, config.change_description, config.config_hash,
)
return config
async def get_active_version(self, agent_id: str) -> AgentVersion | None:
row = await self.db.fetchrow(
"SELECT av.* FROM agent_versions av "
"JOIN agent_deployments ad ON av.version_id = ad.version_id "
"WHERE av.agent_id = $1 AND ad.is_active = true",
agent_id,
)
return AgentVersion(**dict(row)) if row else None
async def rollback(self, agent_id: str, target_version: int) -> dict:
version = await self.db.fetchrow(
"SELECT * FROM agent_versions "
"WHERE agent_id = $1 AND version_number = $2",
agent_id, target_version,
)
if not version:
raise ValueError(f"Version {target_version} not found")
await self.db.execute(
"UPDATE agent_deployments SET is_active = false "
"WHERE agent_id = $1",
agent_id,
)
await self.db.execute(
"INSERT INTO agent_deployments (agent_id, version_id, is_active) "
"VALUES ($1, $2, true)",
agent_id, version["version_id"],
)
return {
"agent_id": agent_id,
"rolled_back_to": target_version,
"config_hash": version["config_hash"],
}
Canary Deployments for Agents
A canary deployment routes a small percentage of traffic to the new version while the majority continues using the proven version. If the canary shows degraded quality or increased errors, roll back automatically before users notice.
import random
class CanaryRouter:
def __init__(self, version_store: VersionStore, metrics_client):
self.versions = version_store
self.metrics = metrics_client
async def resolve_version(
self, agent_id: str, user_id: str
) -> AgentVersion:
canary = await self.get_canary_deployment(agent_id)
if not canary:
return await self.versions.get_active_version(agent_id)
if self.should_route_to_canary(user_id, canary["traffic_pct"]):
self.metrics.increment(
"canary.routed", tags={"agent": agent_id, "version": "canary"}
)
return canary["version"]
return await self.versions.get_active_version(agent_id)
def should_route_to_canary(self, user_id: str, pct: int) -> bool:
# Deterministic routing based on user_id for consistency
hash_val = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
return (hash_val % 100) < pct
async def evaluate_canary(self, agent_id: str) -> str:
canary_metrics = await self.metrics.query(
f"agent_error_rate{{agent='{agent_id}',version='canary'}}"
)
stable_metrics = await self.metrics.query(
f"agent_error_rate{{agent='{agent_id}',version='stable'}}"
)
if canary_metrics["error_rate"] > stable_metrics["error_rate"] * 1.5:
return "rollback"
if canary_metrics["error_rate"] <= stable_metrics["error_rate"] * 1.1:
return "promote"
return "continue"
Feature Flags for Agent Capabilities
Feature flags let you enable or disable specific agent tools or behaviors for subsets of users without deploying new configuration versions. This is useful for gradual rollouts of new agent capabilities.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Comparing Versions with Diff Views
The admin dashboard should show a diff between any two versions, highlighting changes to the system prompt, model settings, and tool configurations. This helps reviewers understand what changed and why before approving a deployment.
FAQ
How do you test agent configuration changes before deploying?
Run the new configuration against a suite of evaluation prompts in a staging environment. Compare the responses against golden answers using LLM-as-judge scoring. Only promote to canary if the evaluation score meets or exceeds the current production version. Automate this as part of a CI pipeline triggered by configuration commits.
Should system prompts be stored in version control (git) or a database?
Both. Use git as the source of truth for prompt development, code review, and history. Sync approved prompts to the database configuration store on merge. The database serves as the runtime configuration that agents read at startup. This gives you the best of both worlds: collaborative editing with pull requests and fast runtime reads.
How long should you keep old agent versions?
Keep all versions indefinitely. Storage cost is negligible since each version is just a few kilobytes of text and JSON. Old versions are valuable for forensic analysis — if a customer reports an issue from two months ago, you need to know exactly which prompt and model configuration was active at that time.
#EnterpriseAI #VersionControl #Deployment #CanaryReleases #FeatureFlags #Rollback #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.