Skip to content
Learn Agentic AI14 min read0 views

Strangler Fig Pattern: Incrementally Migrating from Monolith to Agent Microservices

Apply the strangler fig pattern to incrementally migrate a monolithic AI agent to microservices. Learn routing cutover strategies, feature parity validation, and safe rollback techniques.

What Is the Strangler Fig Pattern

The strangler fig pattern is named after tropical fig trees that grow around a host tree, eventually replacing it entirely. In software, it means building new microservices around an existing monolith, gradually routing traffic from the old system to the new services, and eventually decommissioning the monolith.

For AI agent systems, this is the safest migration approach. Rewriting a production agent from scratch introduces months of risk. The strangler fig approach keeps the monolith running while you extract services one at a time, verify each extraction, and roll back if anything breaks.

Planning the Migration Order

Not all components are equally easy or valuable to extract. Prioritize based on two factors: extraction difficulty (how cleanly the component can be separated) and extraction value (how much benefit independence provides).

# migration_plan.py — Framework for planning extraction order
from dataclasses import dataclass

@dataclass
class ComponentAssessment:
    name: str
    # How many other components call this one (1-10)
    coupling_score: int
    # How much it would benefit from independent scaling (1-10)
    scaling_benefit: int
    # How different its deployment cadence is from the monolith (1-10)
    deployment_independence: int
    # How cleanly its data can be separated (1-10)
    data_isolation: int

    @property
    def extraction_value(self) -> float:
        return (self.scaling_benefit + self.deployment_independence) / 2

    @property
    def extraction_ease(self) -> float:
        return (self.data_isolation + (10 - self.coupling_score)) / 2

    @property
    def priority_score(self) -> float:
        return self.extraction_value * self.extraction_ease

components = [
    ComponentAssessment("RAG Retrieval", 3, 9, 7, 9),
    ComponentAssessment("Tool Execution", 4, 7, 8, 8),
    ComponentAssessment("Memory Store", 5, 5, 6, 7),
    ComponentAssessment("Conversation Manager", 8, 6, 5, 4),
    ComponentAssessment("Auth/Permissions", 7, 3, 4, 6),
]

# Sort by priority — highest first
for c in sorted(components, key=lambda x: x.priority_score, reverse=True):
    print(f"{c.name:25s} value={c.extraction_value:.1f} "
          f"ease={c.extraction_ease:.1f} "
          f"priority={c.priority_score:.1f}")

The RAG retrieval service typically scores highest because it has clean data boundaries (its own vector store), clear scaling needs (GPU-intensive), and low coupling (other components only call it, it does not call others).

Implementing the Routing Layer

The strangler fig pattern requires a routing layer that can send requests to either the monolith or the new microservice. An NGINX configuration handles this:

# nginx-router.conf
upstream monolith {
    server agent-monolith:8000;
}

upstream rag_service {
    server rag-retrieval:8002;
}

upstream tool_service {
    server tool-execution:8001;
}

server {
    listen 80;

    # Extracted: RAG retrieval goes to new service
    location /api/v1/retrieve {
        proxy_pass http://rag_service;
        proxy_set_header X-Migration-Source "strangler-router";
    }

    # Extracted: Tool execution goes to new service
    location /api/v1/tools/execute {
        proxy_pass http://tool_service;
        proxy_set_header X-Migration-Source "strangler-router";
    }

    # Everything else still goes to the monolith
    location / {
        proxy_pass http://monolith;
    }
}

As you extract more services, you add more location blocks routing to new services. The monolith handles less and less traffic until it can be turned off.

Percentage-Based Traffic Splitting

Before routing 100% of traffic to a new service, validate it with a small percentage. Use weighted upstreams:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

# Split traffic: 90% monolith, 10% new RAG service
split_clients $request_id $rag_backend {
    10%  rag_service;
    *    monolith_rag;
}

upstream monolith_rag {
    server agent-monolith:8000;
}

upstream rag_service {
    server rag-retrieval:8002;
}

server {
    location /api/v1/retrieve {
        proxy_pass http://$rag_backend;
    }
}

Start at 10%, monitor error rates and latency, then increase to 25%, 50%, 75%, and finally 100%.

Feature Parity Validation

Before cutting over, verify the new service produces equivalent results. Run both the monolith and the new service in parallel and compare responses:

import asyncio
import httpx
from deepdiff import DeepDiff

class ParityValidator:
    def __init__(self, monolith_url: str, new_service_url: str):
        self.monolith = monolith_url
        self.new_service = new_service_url
        self.client = httpx.AsyncClient(timeout=15.0)
        self.mismatches = []

    async def validate_request(self, path: str, payload: dict):
        # Call both services in parallel
        mono_resp, new_resp = await asyncio.gather(
            self.client.post(
                f"{self.monolith}{path}", json=payload
            ),
            self.client.post(
                f"{self.new_service}{path}", json=payload
            ),
        )

        mono_data = mono_resp.json()
        new_data = new_resp.json()

        diff = DeepDiff(
            mono_data,
            new_data,
            ignore_order=True,
            significant_digits=2,  # Allow minor float differences
            exclude_paths=[
                "root['latency_ms']",
                "root['request_id']",
            ],
        )

        if diff:
            self.mismatches.append({
                "path": path,
                "payload": payload,
                "diff": str(diff),
            })
            return False
        return True

    async def run_validation_suite(self, test_cases: list[dict]):
        results = []
        for case in test_cases:
            passed = await self.validate_request(
                case["path"], case["payload"]
            )
            results.append({
                "case": case["name"],
                "passed": passed,
            })

        passed = sum(1 for r in results if r["passed"])
        total = len(results)
        print(f"Parity: {passed}/{total} cases match")

        if self.mismatches:
            print(f"\nMismatches found:")
            for m in self.mismatches:
                print(f"  {m['path']}: {m['diff']}")

        return passed == total

Run this validator against real production traffic (read-only endpoints) or a replay of recent requests. Only proceed with full cutover when parity exceeds 99%.

Safe Rollback Strategy

Always maintain the ability to roll back to the monolith. The routing layer makes this trivial — change the NGINX config to route traffic back to the monolith:

# rollback.py — Automated rollback on error rate spike
import httpx
import asyncio

PROMETHEUS_URL = "http://prometheus:9090"
NGINX_RELOAD_CMD = "nginx -s reload"
ERROR_THRESHOLD = 0.05  # 5% error rate triggers rollback

async def check_and_rollback(service_name: str):
    query = (
        f'rate(http_requests_total{{service="{service_name}",'
        f'status=~"5.."}}[5m]) / '
        f'rate(http_requests_total{{service="{service_name}"}}[5m])'
    )

    async with httpx.AsyncClient() as client:
        resp = await client.get(
            f"{PROMETHEUS_URL}/api/v1/query",
            params={"query": query},
        )
        result = resp.json()

    if result["data"]["result"]:
        error_rate = float(
            result["data"]["result"][0]["value"][1]
        )
        if error_rate > ERROR_THRESHOLD:
            print(
                f"Error rate {error_rate:.2%} exceeds threshold. "
                f"Rolling back {service_name} to monolith."
            )
            await switch_to_monolith(service_name)
            return True
    return False

Decommissioning the Monolith

The monolith is ready for decommissioning when three conditions are met: all traffic routes to microservices (zero requests to monolith endpoints), parity validation has run for at least two weeks, and the monolith's database receives no writes.

Do not delete the monolith immediately. Keep it deployed but receiving no traffic for one more month as a safety net. Then archive the code and shut it down.

FAQ

How long does a full strangler fig migration typically take?

For a medium-complexity AI agent system (5-8 major components), expect 3 to 6 months. Extract one service every 2-4 weeks, with a validation period between each extraction. Rushing the migration by extracting multiple services simultaneously increases risk and makes it harder to identify the source of regressions.

What if the monolith and new service need to share a database during migration?

This is common and acceptable as a transitional step. The new service reads from the shared database while building its own data store. Once the new service has its own database populated and validated, cut the connection to the shared database. The key rule is that only one service should write to any given table — shared reads are safe, shared writes cause conflicts.

How do I handle in-flight requests during a routing cutover?

NGINX and most load balancers support graceful connection draining. When you change the routing config, existing connections complete against the old backend while new connections route to the new backend. Set a drain timeout (e.g., 30 seconds) that exceeds your longest expected request duration. For streaming agent responses that can last 60 seconds or more, increase the drain timeout accordingly.


#StranglerFig #Migration #Microservices #AgenticAI #Architecture #Refactoring #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.