Post-Migration Validation: Ensuring Agent Quality After System Changes
Learn how to validate AI agent quality after migrations and system changes. Covers validation checklists, regression testing, monitoring dashboards, and automated rollback triggers.
Why Post-Migration Validation Is Not Optional
Migrations are not done when the code deploys. They are done when you have confirmed that the new system matches or exceeds the old system's quality. Without structured validation, subtle regressions hide for weeks — tool calls that used to work now silently fail, response quality degrades on edge cases, or latency increases by 200ms that nobody notices until users complain.
Post-migration validation is a structured process with clear pass/fail criteria and automated rollback triggers.
Step 1: Define a Validation Checklist
Create a programmatic checklist that covers every critical behavior.
from dataclasses import dataclass, field
from enum import Enum
from typing import Callable, Awaitable
class CheckStatus(Enum):
PASS = "pass"
FAIL = "fail"
WARN = "warn"
@dataclass
class ValidationCheck:
name: str
description: str
check_fn: Callable[[], Awaitable[CheckStatus]]
severity: str = "critical" # critical, warning
@dataclass
class ValidationReport:
checks: list[dict] = field(default_factory=list)
passed: int = 0
failed: int = 0
warnings: int = 0
@property
def overall_status(self) -> str:
if self.failed > 0:
return "FAIL — rollback recommended"
if self.warnings > 2:
return "WARN — manual review needed"
return "PASS"
async def run_validation(checks: list[ValidationCheck]) -> ValidationReport:
report = ValidationReport()
for check in checks:
try:
status = await check.check_fn()
except Exception as e:
status = CheckStatus.FAIL
print(f"Check '{check.name}' threw exception: {e}")
report.checks.append({
"name": check.name,
"status": status.value,
"severity": check.severity,
})
if status == CheckStatus.PASS:
report.passed += 1
elif status == CheckStatus.FAIL:
report.failed += 1
else:
report.warnings += 1
return report
Step 2: Implement Regression Tests
Define specific checks for the behaviors your migration could affect.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import httpx
import time
async def check_agent_responds() -> CheckStatus:
"""Verify the agent can process a basic request."""
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8000/api/agent/chat",
json={"message": "Hello, what can you help me with?"},
timeout=30.0,
)
if response.status_code == 200:
body = response.json()
if len(body.get("response", "")) > 10:
return CheckStatus.PASS
return CheckStatus.FAIL
async def check_tool_calling_works() -> CheckStatus:
"""Verify the agent can execute tool calls."""
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8000/api/agent/chat",
json={"message": "Look up invoice INV-001"},
timeout=30.0,
)
body = response.json()
# The response should contain invoice data from the tool
if "INV-001" in body.get("response", ""):
return CheckStatus.PASS
return CheckStatus.FAIL
async def check_latency_acceptable() -> CheckStatus:
"""Verify response latency is within bounds."""
latencies = []
async with httpx.AsyncClient() as client:
for _ in range(5):
start = time.monotonic()
await client.post(
"http://localhost:8000/api/agent/chat",
json={"message": "Hi"},
timeout=30.0,
)
latencies.append(time.monotonic() - start)
p95 = sorted(latencies)[int(len(latencies) * 0.95)]
if p95 < 3.0:
return CheckStatus.PASS
elif p95 < 5.0:
return CheckStatus.WARN
return CheckStatus.FAIL
async def check_database_integrity() -> CheckStatus:
"""Verify all expected tables and indexes exist."""
import asyncpg
conn = await asyncpg.connect("postgresql://...")
tables = await conn.fetch(
"SELECT tablename FROM pg_tables WHERE schemaname = 'public'"
)
table_names = {t["tablename"] for t in tables}
required = {"conversations", "messages", "tool_calls", "sessions"}
if required.issubset(table_names):
await conn.close()
return CheckStatus.PASS
await conn.close()
return CheckStatus.FAIL
Step 3: Assemble and Run the Validation Suite
import asyncio
checks = [
ValidationCheck(
name="Agent responds to basic input",
description="Send a hello message and verify a response",
check_fn=check_agent_responds,
severity="critical",
),
ValidationCheck(
name="Tool calling works",
description="Verify agent can call tools and return results",
check_fn=check_tool_calling_works,
severity="critical",
),
ValidationCheck(
name="Latency within bounds",
description="P95 latency under 3 seconds",
check_fn=check_latency_acceptable,
severity="warning",
),
ValidationCheck(
name="Database integrity",
description="All required tables exist",
check_fn=check_database_integrity,
severity="critical",
),
]
async def main():
report = await run_validation(checks)
print(f"\nValidation Report: {report.overall_status}")
print(f"Passed: {report.passed}, Failed: {report.failed}, "
f"Warnings: {report.warnings}")
for check in report.checks:
icon = "OK" if check["status"] == "pass" else "XX"
print(f" [{icon}] {check['name']}: {check['status']}")
return report
report = asyncio.run(main())
Step 4: Automated Rollback Triggers
Configure monitoring that automatically rolls back if key metrics breach thresholds.
import os
import subprocess
class RollbackController:
def __init__(
self,
error_rate_threshold: float = 0.10,
latency_p99_threshold: float = 10.0,
):
self.error_rate_threshold = error_rate_threshold
self.latency_p99_threshold = latency_p99_threshold
async def evaluate_and_rollback(
self,
current_error_rate: float,
current_latency_p99: float,
) -> bool:
"""Returns True if rollback was triggered."""
reasons = []
if current_error_rate > self.error_rate_threshold:
reasons.append(
f"Error rate {current_error_rate:.1%} > "
f"{self.error_rate_threshold:.1%}"
)
if current_latency_p99 > self.latency_p99_threshold:
reasons.append(
f"P99 latency {current_latency_p99:.1f}s > "
f"{self.latency_p99_threshold:.1f}s"
)
if reasons:
print(f"ROLLBACK TRIGGERED: {'; '.join(reasons)}")
self._execute_rollback()
return True
return False
def _execute_rollback(self):
deploy = os.getenv("K8S_DEPLOYMENT", "agent-backend")
namespace = os.getenv("K8S_NAMESPACE", "default")
subprocess.run([
"kubectl", "rollout", "undo",
f"deployment/{deploy}",
f"-n", namespace,
], check=True)
print(f"Rolled back {deploy} in {namespace}")
FAQ
How long should I monitor after a migration before declaring success?
Monitor intensively for 24 hours, then normally for 7 days. The first 24 hours catch obvious regressions. The 7-day window catches issues that only appear at certain times — weekend traffic patterns, batch jobs that run weekly, or timezone-specific user behavior. Only remove the rollback capability after the 7-day window.
What if validation passes but users still report issues?
Automated checks cannot cover every scenario. Set up a migration feedback channel where users can flag problems. Tag all support tickets during the first week with a migration label so you can quickly spot patterns. Sometimes the migration is fine but an unrelated change shipped alongside it — the label helps isolate causes.
Should I run validation in a staging environment first?
Always. Run the full validation suite against staging with production-like data before touching production. But recognize that staging never perfectly mirrors production — different data volumes, different traffic patterns, different third-party API responses. Staging validation reduces risk but does not eliminate the need for production monitoring.
#Validation #RegressionTesting #Monitoring #PostMigration #QualityAssurance #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.