Building a Deployment Agent: CI/CD Orchestration with AI-Powered Decision Making
Learn how to build an AI agent that orchestrates CI/CD pipelines, performs risk assessment on deployments, analyzes canary metrics, and triggers automatic rollbacks when quality degrades.
Why Deployments Need an AI Agent
A deployment is not just pushing code. It is a decision: Is this change safe to release? Should it go to 1% of traffic first or 100%? What metrics determine success or failure? When should we rollback? Today these decisions are encoded in static YAML pipelines. An AI deployment agent makes these decisions dynamically based on the actual risk profile of each change.
Deployment Pipeline as an Agent Workflow
The agent treats each deployment as a series of decisions rather than a fixed pipeline.
from dataclasses import dataclass
from enum import Enum
from typing import Optional
class DeploymentPhase(Enum):
RISK_ASSESSMENT = "risk_assessment"
CANARY = "canary"
PROGRESSIVE_ROLLOUT = "progressive_rollout"
FULL_ROLLOUT = "full_rollout"
VERIFICATION = "verification"
COMPLETE = "complete"
ROLLED_BACK = "rolled_back"
@dataclass
class DeploymentContext:
deploy_id: str
service: str
namespace: str
image_tag: str
previous_tag: str
changed_files: list[str]
commit_message: str
author: str
phase: DeploymentPhase = DeploymentPhase.RISK_ASSESSMENT
canary_percentage: int = 0
risk_score: float = 0.0
metrics_snapshot: Optional[dict] = None
Risk Assessment Before Deployment
The agent analyzes what changed and assigns a risk score that determines the rollout strategy.
import openai
import json
RISK_ASSESSMENT_PROMPT = """Analyze this deployment for risk level.
Service: {service}
Changed files: {changed_files}
Commit message: {commit_message}
Lines changed: {lines_changed}
Assess risk on a scale of 0.0 to 1.0 based on:
- Database migrations present (high risk)
- Config/environment changes (medium risk)
- API contract changes (high risk)
- Pure frontend/cosmetic changes (low risk)
- Test-only changes (minimal risk)
Return JSON with: risk_score, risk_factors (list of strings),
recommended_strategy (one of: direct, canary_5, canary_10, canary_25),
requires_manual_approval (boolean).
"""
async def assess_risk(ctx: DeploymentContext) -> dict:
client = openai.AsyncOpenAI()
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": RISK_ASSESSMENT_PROMPT.format(
service=ctx.service,
changed_files="\n".join(ctx.changed_files),
commit_message=ctx.commit_message,
lines_changed=len(ctx.changed_files) * 50, # estimate
),
}],
response_format={"type": "json_object"},
temperature=0.0,
)
return json.loads(response.choices[0].message.content)
Canary Deployment with Metric Analysis
Once the canary is live, the agent continuously compares canary metrics against the baseline.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import numpy as np
from scipy.stats import mannwhitneyu
class CanaryAnalyzer:
def __init__(self, prom_url: str = "http://prometheus:9090"):
self.prom_url = prom_url
self.thresholds = {
"error_rate_increase": 0.05, # 5% increase triggers rollback
"p99_latency_increase": 1.3, # 30% latency increase
"success_rate_minimum": 0.995, # 99.5% success rate floor
}
async def compare_canary_to_baseline(
self, service: str, namespace: str, duration_minutes: int = 15
) -> dict:
baseline_errors = await self._query_error_rate(
service, namespace, "stable", duration_minutes
)
canary_errors = await self._query_error_rate(
service, namespace, "canary", duration_minutes
)
baseline_latency = await self._query_p99_latency(
service, namespace, "stable", duration_minutes
)
canary_latency = await self._query_p99_latency(
service, namespace, "canary", duration_minutes
)
# Statistical test: is canary significantly worse?
error_stat, error_p = mannwhitneyu(
canary_errors, baseline_errors, alternative="greater"
)
latency_stat, latency_p = mannwhitneyu(
canary_latency, baseline_latency, alternative="greater"
)
return {
"error_rate_canary": float(np.mean(canary_errors)),
"error_rate_baseline": float(np.mean(baseline_errors)),
"error_p_value": float(error_p),
"latency_canary_p99": float(np.percentile(canary_latency, 99)),
"latency_baseline_p99": float(np.percentile(baseline_latency, 99)),
"latency_p_value": float(latency_p),
"should_rollback": error_p < 0.05 or latency_p < 0.05,
"should_promote": error_p > 0.3 and latency_p > 0.3,
}
async def _query_error_rate(self, service, ns, track, minutes):
# Fetch from Prometheus - simplified
return np.random.uniform(0.001, 0.01, size=minutes)
async def _query_p99_latency(self, service, ns, track, minutes):
return np.random.uniform(100, 200, size=minutes)
Automated Rollback
When the canary analysis indicates degradation, the agent executes an immediate rollback.
import subprocess
import logging
logger = logging.getLogger("deployment-agent")
async def rollback_deployment(ctx: DeploymentContext, reason: str) -> bool:
logger.warning(
f"Rolling back {ctx.service} from {ctx.image_tag} to "
f"{ctx.previous_tag}. Reason: {reason}"
)
result = subprocess.run(
[
"kubectl", "set", "image",
f"deployment/{ctx.service}",
f"{ctx.service}={ctx.service}:{ctx.previous_tag}",
"-n", ctx.namespace,
],
capture_output=True, text=True, timeout=60,
)
if result.returncode == 0:
logger.info(f"Rollback successful for {ctx.service}")
ctx.phase = DeploymentPhase.ROLLED_BACK
return True
else:
logger.error(f"Rollback failed: {result.stderr}")
return False
The Deployment Agent Orchestration Loop
import asyncio
async def deploy(ctx: DeploymentContext):
# Phase 1: Risk assessment
risk = await assess_risk(ctx)
ctx.risk_score = risk["risk_score"]
strategy = risk["recommended_strategy"]
if risk["requires_manual_approval"]:
approved = await request_human_approval(ctx, risk)
if not approved:
return
# Phase 2: Canary deployment
canary_pct = {"direct": 100, "canary_5": 5, "canary_10": 10, "canary_25": 25}
ctx.canary_percentage = canary_pct[strategy]
await apply_canary(ctx)
ctx.phase = DeploymentPhase.CANARY
# Phase 3: Monitor canary for 15 minutes
analyzer = CanaryAnalyzer()
for check in range(3):
await asyncio.sleep(300)
result = await analyzer.compare_canary_to_baseline(
ctx.service, ctx.namespace
)
if result["should_rollback"]:
await rollback_deployment(ctx, f"Canary degradation: {result}")
return
if result["should_promote"]:
break
# Phase 4: Full rollout
ctx.phase = DeploymentPhase.FULL_ROLLOUT
await promote_canary_to_full(ctx)
ctx.phase = DeploymentPhase.COMPLETE
FAQ
How does the agent decide between a direct deploy and a canary?
The risk assessment model examines the changed files, their types, and the blast radius. Database migrations, API contract changes, and infrastructure config changes trigger canary deployments. Pure frontend or documentation changes can go direct. The risk score threshold is tunable per team.
What happens if the Prometheus metrics are unavailable during canary analysis?
The agent should treat missing metrics as a risk signal rather than ignoring them. If it cannot fetch baseline or canary metrics after three retries, it pauses the rollout and alerts the team. Never promote a canary when you cannot verify its health.
Can this approach work with GitOps tools like ArgoCD?
Yes. Instead of running kubectl commands directly, the agent commits to the GitOps repository. It updates the image tag in the deployment manifest, creates a PR, and ArgoCD syncs the change. The canary analysis still works the same way since it reads metrics from Prometheus regardless of how the deployment was applied.
#CICD #Deployment #DevOps #CanaryAnalysis #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.