Migrating from Single-Agent to Multi-Agent Architecture: When and How to Split
Learn when and how to decompose a single AI agent into a multi-agent system. Covers decomposition criteria, handoff design, shared state management, and testing strategies.
When a Single Agent Outgrows Itself
A single agent with a 2,000-word system prompt, 15 tools, and branching logic for six different domains is a maintenance nightmare. It hallucinates more because the instructions are too broad. It uses the wrong tools because it has too many to choose from. And every change to one domain risks breaking another.
Multi-agent architecture solves this by giving each agent a focused scope: fewer tools, shorter instructions, and a clear domain boundary. The triage pattern — one router agent that delegates to specialists — is the most common and most effective starting point.
When to Split: Concrete Criteria
Not every agent needs splitting. Apply these decision criteria:
from dataclasses import dataclass
@dataclass
class AgentComplexityAssessment:
tool_count: int
system_prompt_words: int
domain_count: int
avg_accuracy: float
error_rate_pct: float
def should_split(assessment: AgentComplexityAssessment) -> dict:
"""Evaluate whether a single agent should become multi-agent."""
reasons = []
if assessment.tool_count > 8:
reasons.append(
f"Too many tools ({assessment.tool_count}): "
"LLMs select tools less accurately beyond 8"
)
if assessment.system_prompt_words > 800:
reasons.append(
f"Prompt too long ({assessment.system_prompt_words} words): "
"instructions compete for attention"
)
if assessment.domain_count > 3:
reasons.append(
f"Too many domains ({assessment.domain_count}): "
"each domain should be a specialist agent"
)
if assessment.error_rate_pct > 15:
reasons.append(
f"Error rate too high ({assessment.error_rate_pct}%): "
"splitting reduces per-domain error rates"
)
return {
"should_split": len(reasons) >= 2,
"reasons": reasons,
"recommended_agents": assessment.domain_count + 1, # +1 for triage
}
result = should_split(AgentComplexityAssessment(
tool_count=12, system_prompt_words=1200,
domain_count=4, avg_accuracy=0.78, error_rate_pct=22,
))
print(result)
Decomposition: Extracting Specialist Agents
Start with your existing single agent and extract one domain at a time.
from agents import Agent, Runner, function_tool
# ── Specialist: Billing Agent ──
@function_tool
def get_invoice(invoice_id: str) -> str:
"""Look up an invoice by ID."""
return f"Invoice {invoice_id}: $150.00, paid 2026-03-01"
@function_tool
def process_refund(invoice_id: str, reason: str) -> str:
"""Process a refund for an invoice."""
return f"Refund initiated for {invoice_id}: {reason}"
billing_agent = Agent(
name="Billing Specialist",
instructions=(
"You handle billing inquiries: invoices, payments, and refunds. "
"Always confirm the invoice ID before taking action."
),
model="gpt-4o",
tools=[get_invoice, process_refund],
)
# ── Specialist: Technical Support Agent ──
@function_tool
def search_knowledge_base(query: str) -> str:
"""Search the technical knowledge base."""
return f"KB result for '{query}': Check Settings > Advanced > Reset"
tech_agent = Agent(
name="Technical Support",
instructions=(
"You handle technical issues: bugs, configuration, and how-to questions. "
"Search the knowledge base before guessing."
),
model="gpt-4o",
tools=[search_knowledge_base],
)
Handoff Design: The Triage Pattern
The triage agent decides which specialist handles the request.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
triage_agent = Agent(
name="Triage Agent",
instructions=(
"You are the first point of contact. Determine what the user needs "
"and hand off to the right specialist.\n"
"- Billing questions: hand to Billing Specialist\n"
"- Technical issues: hand to Technical Support\n"
"If unclear, ask the user to clarify."
),
model="gpt-4o",
handoffs=[billing_agent, tech_agent],
)
result = Runner.run_sync(triage_agent, "I need a refund for invoice INV-42")
print(result.final_output)
# Triage routes to billing_agent, which calls process_refund
Shared State Between Agents
When agents need to share context (like user identity or session data), pass it through the agent context.
from agents import Agent, Runner, RunContextWrapper
from dataclasses import dataclass
@dataclass
class UserContext:
user_id: str
account_tier: str
language: str
@function_tool
async def get_account_details(
wrapper: RunContextWrapper[UserContext],
) -> str:
"""Get the current user account details."""
ctx = wrapper.context
return f"User {ctx.user_id}, tier: {ctx.account_tier}"
billing_agent = Agent(
name="Billing",
instructions="Handle billing. Use context for user info.",
model="gpt-4o",
tools=[get_account_details],
)
user_ctx = UserContext(
user_id="u-1234", account_tier="premium", language="en",
)
result = Runner.run_sync(billing_agent, "What is my account tier?", context=user_ctx)
Testing Multi-Agent Systems
Test each agent in isolation, then test the handoff paths.
import pytest
class TestMultiAgentHandoffs:
def test_billing_query_routes_to_billing(self):
result = Runner.run_sync(
triage_agent, "What is my latest invoice?"
)
# Verify the billing agent handled it
assert "invoice" in result.final_output.lower()
def test_tech_query_routes_to_tech(self):
result = Runner.run_sync(
triage_agent, "How do I reset my password?"
)
assert "settings" in result.final_output.lower()
def test_ambiguous_query_asks_for_clarification(self):
result = Runner.run_sync(
triage_agent, "I have a problem"
)
assert "?" in result.final_output # Agent should ask for details
FAQ
How many specialist agents should I create?
Create one agent per clearly distinct domain. Most systems work well with 3-6 specialists plus a triage agent. Going beyond 8-10 specialists creates its own complexity — the triage agent struggles to route accurately, which is the same problem you had with too many tools on a single agent.
Can specialist agents hand off to each other, or only back to triage?
Both patterns work. Direct specialist-to-specialist handoffs are faster but create coupling. Routing everything through triage is cleaner and easier to debug. Start with triage-mediated handoffs and add direct handoffs only for proven high-frequency paths.
How do I handle a request that spans multiple domains?
Use the triage agent to orchestrate sequential handoffs. The user asks about a billing error on a technical feature — triage routes to tech support first, then hands off to billing with the technical context attached. The Agents SDK preserves conversation history across handoffs automatically.
#MultiAgent #Architecture #AgentHandoffs #Decomposition #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.