Migrating from Single-Agent to Multi-Agent Architecture: When and How to Split

When a Single Agent Outgrows Itself

A single agent with a 2,000-word system prompt, 15 tools, and branching logic for six different domains is a maintenance nightmare. It hallucinates more because the instructions are too broad. It uses the wrong tools because it has too many to choose from. And every change to one domain risks breaking another.

Multi-agent architecture solves this by giving each agent a focused scope: fewer tools, shorter instructions, and a clear domain boundary. The triage pattern — one router agent that delegates to specialists — is the most common and most effective starting point.

When to Split: Concrete Criteria

Not every agent needs splitting. Apply these decision criteria:

from dataclasses import dataclass

@dataclass
class AgentComplexityAssessment:
    tool_count: int
    system_prompt_words: int
    domain_count: int
    avg_accuracy: float
    error_rate_pct: float

def should_split(assessment: AgentComplexityAssessment) -> dict:
    """Evaluate whether a single agent should become multi-agent."""
    reasons = []

    if assessment.tool_count > 8:
        reasons.append(
            f"Too many tools ({assessment.tool_count}): "
            "LLMs select tools less accurately beyond 8"
        )
    if assessment.system_prompt_words > 800:
        reasons.append(
            f"Prompt too long ({assessment.system_prompt_words} words): "
            "instructions compete for attention"
        )
    if assessment.domain_count > 3:
        reasons.append(
            f"Too many domains ({assessment.domain_count}): "
            "each domain should be a specialist agent"
        )
    if assessment.error_rate_pct > 15:
        reasons.append(
            f"Error rate too high ({assessment.error_rate_pct}%): "
            "splitting reduces per-domain error rates"
        )

    return {
        "should_split": len(reasons) >= 2,
        "reasons": reasons,
        "recommended_agents": assessment.domain_count + 1,  # +1 for triage
    }

result = should_split(AgentComplexityAssessment(
    tool_count=12, system_prompt_words=1200,
    domain_count=4, avg_accuracy=0.78, error_rate_pct=22,
))
print(result)

Decomposition: Extracting Specialist Agents

Start with your existing single agent and extract one domain at a time.

from agents import Agent, Runner, function_tool

# ── Specialist: Billing Agent ──
@function_tool
def get_invoice(invoice_id: str) -> str:
    """Look up an invoice by ID."""
    return f"Invoice {invoice_id}: $150.00, paid 2026-03-01"

@function_tool
def process_refund(invoice_id: str, reason: str) -> str:
    """Process a refund for an invoice."""
    return f"Refund initiated for {invoice_id}: {reason}"

billing_agent = Agent(
    name="Billing Specialist",
    instructions=(
        "You handle billing inquiries: invoices, payments, and refunds. "
        "Always confirm the invoice ID before taking action."
    ),
    model="gpt-4o",
    tools=[get_invoice, process_refund],
)

# ── Specialist: Technical Support Agent ──
@function_tool
def search_knowledge_base(query: str) -> str:
    """Search the technical knowledge base."""
    return f"KB result for '{query}': Check Settings > Advanced > Reset"

tech_agent = Agent(
    name="Technical Support",
    instructions=(
        "You handle technical issues: bugs, configuration, and how-to questions. "
        "Search the knowledge base before guessing."
    ),
    model="gpt-4o",
    tools=[search_knowledge_base],
)

Handoff Design: The Triage Pattern

The triage agent decides which specialist handles the request.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

triage_agent = Agent(
    name="Triage Agent",
    instructions=(
        "You are the first point of contact. Determine what the user needs "
        "and hand off to the right specialist.\n"
        "- Billing questions: hand to Billing Specialist\n"
        "- Technical issues: hand to Technical Support\n"
        "If unclear, ask the user to clarify."
    ),
    model="gpt-4o",
    handoffs=[billing_agent, tech_agent],
)

result = Runner.run_sync(triage_agent, "I need a refund for invoice INV-42")
print(result.final_output)
# Triage routes to billing_agent, which calls process_refund

Shared State Between Agents

When agents need to share context (like user identity or session data), pass it through the agent context.

from agents import Agent, Runner, RunContextWrapper
from dataclasses import dataclass

@dataclass
class UserContext:
    user_id: str
    account_tier: str
    language: str

@function_tool
async def get_account_details(
    wrapper: RunContextWrapper[UserContext],
) -> str:
    """Get the current user account details."""
    ctx = wrapper.context
    return f"User {ctx.user_id}, tier: {ctx.account_tier}"

billing_agent = Agent(
    name="Billing",
    instructions="Handle billing. Use context for user info.",
    model="gpt-4o",
    tools=[get_account_details],
)

user_ctx = UserContext(
    user_id="u-1234", account_tier="premium", language="en",
)
result = Runner.run_sync(billing_agent, "What is my account tier?", context=user_ctx)

Testing Multi-Agent Systems

Test each agent in isolation, then test the handoff paths.

import pytest

class TestMultiAgentHandoffs:
    def test_billing_query_routes_to_billing(self):
        result = Runner.run_sync(
            triage_agent, "What is my latest invoice?"
        )
        # Verify the billing agent handled it
        assert "invoice" in result.final_output.lower()

    def test_tech_query_routes_to_tech(self):
        result = Runner.run_sync(
            triage_agent, "How do I reset my password?"
        )
        assert "settings" in result.final_output.lower()

    def test_ambiguous_query_asks_for_clarification(self):
        result = Runner.run_sync(
            triage_agent, "I have a problem"
        )
        assert "?" in result.final_output  # Agent should ask for details

FAQ

How many specialist agents should I create?

Create one agent per clearly distinct domain. Most systems work well with 3-6 specialists plus a triage agent. Going beyond 8-10 specialists creates its own complexity — the triage agent struggles to route accurately, which is the same problem you had with too many tools on a single agent.

Can specialist agents hand off to each other, or only back to triage?

Both patterns work. Direct specialist-to-specialist handoffs are faster but create coupling. Routing everything through triage is cleaner and easier to debug. Start with triage-mediated handoffs and add direct handoffs only for proven high-frequency paths.

How do I handle a request that spans multiple domains?

Use the triage agent to orchestrate sequential handoffs. The user asks about a billing error on a technical feature — triage routes to tech support first, then hands off to billing with the technical context attached. The Agents SDK preserves conversation history across handoffs automatically.

#MultiAgent #Architecture #AgentHandoffs #Decomposition #Python #AgenticAI #LearnAI #AIEngineering

Migrating from Single-Agent to Multi-Agent Architecture: When and How to Split

When a Single Agent Outgrows Itself

When to Split: Concrete Criteria

Decomposition: Extracting Specialist Agents

Handoff Design: The Triage Pattern

Shared State Between Agents

Testing Multi-Agent Systems

FAQ

How many specialist agents should I create?

Can specialist agents hand off to each other, or only back to triage?

How do I handle a request that spans multiple domains?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding