Blue-Green Deployments for AI Agents: Zero-Downtime Model and Prompt Updates
Implement blue-green deployment strategies for AI agent services to achieve zero-downtime updates, safe model swaps, traffic splitting, and instant rollback for prompt and model changes.
Why Blue-Green Deployments for AI Agents
Deploying a new version of an AI agent is riskier than deploying a typical web service. A subtle prompt change can make the agent behave inappropriately. A model upgrade might produce longer or shorter responses that break client parsing. A tool integration update might introduce latency that causes timeouts. You need the ability to deploy, validate, and roll back in seconds, not minutes.
Blue-green deployment maintains two identical production environments. Only one (the "live" environment) receives user traffic at any time. You deploy updates to the idle environment, validate them, then switch traffic. If anything goes wrong, switching back is instantaneous.
Kubernetes Blue-Green Architecture
Create two Deployments and a single Service that targets one of them:
flowchart TD
START["Blue-Green Deployments for AI Agents: Zero-Downti…"] --> A
A["Why Blue-Green Deployments for AI Agents"]
A --> B
B["Kubernetes Blue-Green Architecture"]
B --> C
C["The Traffic-Switching Service"]
C --> D
D["Deployment Script with Validation"]
D --> E
E["Rollback Procedure"]
E --> F
F["Canary Testing Before Full Switch"]
F --> G
G["FAQ"]
G --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
# k8s/blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-blue
namespace: ai-agents
labels:
app: agent-service
slot: blue
spec:
replicas: 3
selector:
matchLabels:
app: agent-service
slot: blue
template:
metadata:
labels:
app: agent-service
slot: blue
version: "1.2.0"
spec:
containers:
- name: agent
image: registry.example.com/agent-service:1.2.0
ports:
- containerPort: 8000
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: agent-secrets
key: openai-api-key
- name: AGENT_VERSION
value: "1.2.0"
readinessProbe:
httpGet:
path: /readyz
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
# k8s/green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-green
namespace: ai-agents
labels:
app: agent-service
slot: green
spec:
replicas: 3
selector:
matchLabels:
app: agent-service
slot: green
template:
metadata:
labels:
app: agent-service
slot: green
version: "1.3.0"
spec:
containers:
- name: agent
image: registry.example.com/agent-service:1.3.0
ports:
- containerPort: 8000
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: agent-secrets
key: openai-api-key
- name: AGENT_VERSION
value: "1.3.0"
readinessProbe:
httpGet:
path: /readyz
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
The Traffic-Switching Service
A single Service points to whichever slot is live:
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: agent-service
namespace: ai-agents
spec:
selector:
app: agent-service
slot: blue # <-- Change this to "green" to switch traffic
ports:
- port: 80
targetPort: 8000
Switch traffic by patching the selector:
# Switch from blue to green
kubectl patch service agent-service -n ai-agents \
-p '{"spec": {"selector": {"slot": "green"}}}'
# Verify the switch
kubectl get endpoints agent-service -n ai-agents
Traffic switches in seconds because all green pods are already running and healthy.
Deployment Script with Validation
Automate the deploy-validate-switch workflow:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
#!/usr/bin/env python3
# scripts/deploy.py
import subprocess
import sys
import time
import httpx
def run(cmd: str) -> str:
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
if result.returncode != 0:
print(f"FAILED: {cmd}\n{result.stderr}")
sys.exit(1)
return result.stdout.strip()
def get_live_slot() -> str:
output = run("kubectl get svc agent-service -n ai-agents -o jsonpath='{.spec.selector.slot}'")
return output.strip("'")
def get_idle_slot(live: str) -> str:
return "green" if live == "blue" else "blue"
def wait_for_ready(deployment: str, timeout: int = 120):
print(f"Waiting for {deployment} to be ready...")
run(f"kubectl rollout status deployment/{deployment} -n ai-agents --timeout={timeout}s")
def validate_slot(slot: str) -> bool:
"""Run smoke tests against the idle slot."""
port_forward = subprocess.Popen(
f"kubectl port-forward deploy/agent-{slot} 9090:8000 -n ai-agents",
shell=True,
)
time.sleep(3)
try:
resp = httpx.get("http://localhost:9090/readyz", timeout=10)
return resp.status_code == 200
finally:
port_forward.terminate()
def main():
image = sys.argv[1] # e.g., registry.example.com/agent-service:1.3.0
live = get_live_slot()
idle = get_idle_slot(live)
print(f"Live: {live}, Deploying to: {idle}")
run(f"kubectl set image deployment/agent-{idle} agent={image} -n ai-agents")
wait_for_ready(f"agent-{idle}")
if not validate_slot(idle):
print("Validation failed. Aborting.")
sys.exit(1)
run(f"kubectl patch svc agent-service -n ai-agents -p '{{"spec": {{"selector": {{"slot": "{idle}"}}}}}}'")
print(f"Traffic switched to {idle}")
if __name__ == "__main__":
main()
Rollback Procedure
Rollback is a single command — switch traffic back to the previous slot:
# If green is live and broken, switch back to blue
kubectl patch service agent-service -n ai-agents \
-p '{"spec": {"selector": {"slot": "blue"}}}'
The old version is still running with full replicas. No image pulls, no pod startups, no waiting.
Canary Testing Before Full Switch
Route a percentage of traffic to the new slot before committing:
# Using nginx ingress annotations for traffic splitting
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: agent-canary
namespace: ai-agents
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
rules:
- host: agent.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: agent-green
port:
number: 80
This sends 10% of traffic to green while blue handles the remaining 90%. Monitor error rates and latency, then increase the canary weight or roll back.
FAQ
How long should I keep the old (idle) deployment running after a switch?
Keep it running for at least the duration of your monitoring window — typically 30 minutes to a few hours. If you detect degradation in the new version, you can roll back instantly. Once you are confident the new version is stable, either leave the idle deployment as a standby or scale it to zero replicas to save resources.
How do blue-green deployments handle database migrations?
Database schema changes must be backward compatible. Both blue and green versions will run against the same database simultaneously during the transition. Use expand-and-contract migrations: first add new columns or tables (expand), deploy the new version, then remove old columns in a later release (contract). Never drop columns or change types in the same release that introduces the code change.
Can I use blue-green deployments to A/B test different AI agent prompts?
Yes. Deploy different prompt versions to blue and green, then use canary weights to split traffic. Compare metrics like task completion rate, user satisfaction, response latency, and cost per conversation across the two versions. This is one of the most powerful patterns for iterating on agent prompts in production with real user traffic.
#BlueGreenDeployment #AIAgents #ZeroDowntime #Kubernetes #DevOps #AgenticAI #LearnAI #AIEngineering
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.