Post-Migration Validation: Ensuring Agent Quality After System Changes

Why Post-Migration Validation Is Not Optional

Migrations are not done when the code deploys. They are done when you have confirmed that the new system matches or exceeds the old system's quality. Without structured validation, subtle regressions hide for weeks — tool calls that used to work now silently fail, response quality degrades on edge cases, or latency increases by 200ms that nobody notices until users complain.

Post-migration validation is a structured process with clear pass/fail criteria and automated rollback triggers.

Step 1: Define a Validation Checklist

Create a programmatic checklist that covers every critical behavior.

from dataclasses import dataclass, field
from enum import Enum
from typing import Callable, Awaitable

class CheckStatus(Enum):
    PASS = "pass"
    FAIL = "fail"
    WARN = "warn"

@dataclass
class ValidationCheck:
    name: str
    description: str
    check_fn: Callable[[], Awaitable[CheckStatus]]
    severity: str = "critical"  # critical, warning

@dataclass
class ValidationReport:
    checks: list[dict] = field(default_factory=list)
    passed: int = 0
    failed: int = 0
    warnings: int = 0

    @property
    def overall_status(self) -> str:
        if self.failed > 0:
            return "FAIL — rollback recommended"
        if self.warnings > 2:
            return "WARN — manual review needed"
        return "PASS"

async def run_validation(checks: list[ValidationCheck]) -> ValidationReport:
    report = ValidationReport()

    for check in checks:
        try:
            status = await check.check_fn()
        except Exception as e:
            status = CheckStatus.FAIL
            print(f"Check '{check.name}' threw exception: {e}")

        report.checks.append({
            "name": check.name,
            "status": status.value,
            "severity": check.severity,
        })

        if status == CheckStatus.PASS:
            report.passed += 1
        elif status == CheckStatus.FAIL:
            report.failed += 1
        else:
            report.warnings += 1

    return report

Step 2: Implement Regression Tests

Define specific checks for the behaviors your migration could affect.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

import httpx
import time

async def check_agent_responds() -> CheckStatus:
    """Verify the agent can process a basic request."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "http://localhost:8000/api/agent/chat",
            json={"message": "Hello, what can you help me with?"},
            timeout=30.0,
        )
    if response.status_code == 200:
        body = response.json()
        if len(body.get("response", "")) > 10:
            return CheckStatus.PASS
    return CheckStatus.FAIL

async def check_tool_calling_works() -> CheckStatus:
    """Verify the agent can execute tool calls."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "http://localhost:8000/api/agent/chat",
            json={"message": "Look up invoice INV-001"},
            timeout=30.0,
        )
    body = response.json()
    # The response should contain invoice data from the tool
    if "INV-001" in body.get("response", ""):
        return CheckStatus.PASS
    return CheckStatus.FAIL

async def check_latency_acceptable() -> CheckStatus:
    """Verify response latency is within bounds."""
    latencies = []
    async with httpx.AsyncClient() as client:
        for _ in range(5):
            start = time.monotonic()
            await client.post(
                "http://localhost:8000/api/agent/chat",
                json={"message": "Hi"},
                timeout=30.0,
            )
            latencies.append(time.monotonic() - start)

    p95 = sorted(latencies)[int(len(latencies) * 0.95)]
    if p95 < 3.0:
        return CheckStatus.PASS
    elif p95 < 5.0:
        return CheckStatus.WARN
    return CheckStatus.FAIL

async def check_database_integrity() -> CheckStatus:
    """Verify all expected tables and indexes exist."""
    import asyncpg
    conn = await asyncpg.connect("postgresql://...")
    tables = await conn.fetch(
        "SELECT tablename FROM pg_tables WHERE schemaname = 'public'"
    )
    table_names = {t["tablename"] for t in tables}
    required = {"conversations", "messages", "tool_calls", "sessions"}

    if required.issubset(table_names):
        await conn.close()
        return CheckStatus.PASS
    await conn.close()
    return CheckStatus.FAIL

Step 3: Assemble and Run the Validation Suite

import asyncio

checks = [
    ValidationCheck(
        name="Agent responds to basic input",
        description="Send a hello message and verify a response",
        check_fn=check_agent_responds,
        severity="critical",
    ),
    ValidationCheck(
        name="Tool calling works",
        description="Verify agent can call tools and return results",
        check_fn=check_tool_calling_works,
        severity="critical",
    ),
    ValidationCheck(
        name="Latency within bounds",
        description="P95 latency under 3 seconds",
        check_fn=check_latency_acceptable,
        severity="warning",
    ),
    ValidationCheck(
        name="Database integrity",
        description="All required tables exist",
        check_fn=check_database_integrity,
        severity="critical",
    ),
]

async def main():
    report = await run_validation(checks)
    print(f"\nValidation Report: {report.overall_status}")
    print(f"Passed: {report.passed}, Failed: {report.failed}, "
          f"Warnings: {report.warnings}")

    for check in report.checks:
        icon = "OK" if check["status"] == "pass" else "XX"
        print(f"  [{icon}] {check['name']}: {check['status']}")

    return report

report = asyncio.run(main())

Step 4: Automated Rollback Triggers

Configure monitoring that automatically rolls back if key metrics breach thresholds.

import os
import subprocess

class RollbackController:
    def __init__(
        self,
        error_rate_threshold: float = 0.10,
        latency_p99_threshold: float = 10.0,
    ):
        self.error_rate_threshold = error_rate_threshold
        self.latency_p99_threshold = latency_p99_threshold

    async def evaluate_and_rollback(
        self,
        current_error_rate: float,
        current_latency_p99: float,
    ) -> bool:
        """Returns True if rollback was triggered."""
        reasons = []

        if current_error_rate > self.error_rate_threshold:
            reasons.append(
                f"Error rate {current_error_rate:.1%} > "
                f"{self.error_rate_threshold:.1%}"
            )
        if current_latency_p99 > self.latency_p99_threshold:
            reasons.append(
                f"P99 latency {current_latency_p99:.1f}s > "
                f"{self.latency_p99_threshold:.1f}s"
            )

        if reasons:
            print(f"ROLLBACK TRIGGERED: {'; '.join(reasons)}")
            self._execute_rollback()
            return True
        return False

    def _execute_rollback(self):
        deploy = os.getenv("K8S_DEPLOYMENT", "agent-backend")
        namespace = os.getenv("K8S_NAMESPACE", "default")
        subprocess.run([
            "kubectl", "rollout", "undo",
            f"deployment/{deploy}",
            f"-n", namespace,
        ], check=True)
        print(f"Rolled back {deploy} in {namespace}")

FAQ

How long should I monitor after a migration before declaring success?

Monitor intensively for 24 hours, then normally for 7 days. The first 24 hours catch obvious regressions. The 7-day window catches issues that only appear at certain times — weekend traffic patterns, batch jobs that run weekly, or timezone-specific user behavior. Only remove the rollback capability after the 7-day window.

What if validation passes but users still report issues?

Automated checks cannot cover every scenario. Set up a migration feedback channel where users can flag problems. Tag all support tickets during the first week with a migration label so you can quickly spot patterns. Sometimes the migration is fine but an unrelated change shipped alongside it — the label helps isolate causes.

Should I run validation in a staging environment first?

Always. Run the full validation suite against staging with production-like data before touching production. But recognize that staging never perfectly mirrors production — different data volumes, different traffic patterns, different third-party API responses. Staging validation reduces risk but does not eliminate the need for production monitoring.

#Validation #RegressionTesting #Monitoring #PostMigration #QualityAssurance #AgenticAI #LearnAI #AIEngineering

Post-Migration Validation: Ensuring Agent Quality After System Changes

Why Post-Migration Validation Is Not Optional

Step 1: Define a Validation Checklist

Step 2: Implement Regression Tests

Step 3: Assemble and Run the Validation Suite

Step 4: Automated Rollback Triggers

FAQ

How long should I monitor after a migration before declaring success?

What if validation passes but users still report issues?

Should I run validation in a staging environment first?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding