Building Hierarchical Agent Architectures: Triage, Specialist, and Supervisor Patterns
Deep technical guide to hierarchical agent design with triage routing, specialist handoffs, and supervisor oversight patterns including code examples with OpenAI Agents SDK.
Why Hierarchical Architectures Dominate Production Systems
Flat agent architectures — where every agent can talk to every other agent — work for demos with three or four agents. In production, they collapse under their own complexity. With N agents, a flat topology creates N*(N-1)/2 potential communication paths. At 20 agents, that is 190 paths to reason about, test, and monitor.
Hierarchical architectures solve this by organizing agents into layers with clear authority boundaries. A customer request enters through a triage layer, gets routed to the appropriate specialist, and the entire interaction is monitored by a supervisor. This mirrors how human organizations work — and for good reason: it scales.
This guide walks through the three core roles in a hierarchical agent system, with implementation code using the OpenAI Agents SDK and general patterns that apply to any framework.
The Three Roles
Triage Agent
The triage agent is the front door of your system. Its only job is to classify incoming requests and route them to the correct specialist. It should never attempt to answer questions directly. A triage agent that tries to be helpful by answering "simple" questions inevitably gets the boundary wrong and handles tasks it should delegate.
from agents import Agent, handoff, RunContext
from agents.extensions.handoff_prompt import RECOMMENDED_PROMPT_PREFIX
# Define specialist agents first (shown below)
billing_agent = Agent(
name="Billing Specialist",
instructions="""You handle all billing-related queries:
invoices, payment methods, refunds, subscription changes.
Always verify the customer's account before making changes.
For refunds over $500, escalate to supervisor.""",
model="gpt-4.1",
)
technical_agent = Agent(
name="Technical Specialist",
instructions="""You handle technical support:
API errors, integration issues, performance problems.
Always ask for error codes and timestamps.
For production outages, escalate to supervisor immediately.""",
model="gpt-4.1",
)
sales_agent = Agent(
name="Sales Specialist",
instructions="""You handle sales inquiries:
pricing, feature comparisons, enterprise plans.
Never commit to custom pricing without supervisor approval.""",
model="gpt-4.1-mini",
)
triage_agent = Agent(
name="Triage Agent",
instructions=f"""{RECOMMENDED_PROMPT_PREFIX}
You are the initial contact point. Your ONLY job is to
understand the customer's intent and route them to the
correct specialist. NEVER answer questions directly.
Routing rules:
- Billing, payments, invoices, refunds -> Billing Specialist
- API errors, bugs, technical issues -> Technical Specialist
- Pricing, plans, feature questions -> Sales Specialist
- Unclear intent -> Ask ONE clarifying question, then route
""",
handoffs=[
handoff(billing_agent),
handoff(technical_agent),
handoff(sales_agent),
],
model="gpt-4.1-mini",
)
Key design decisions for triage agents:
Use a smaller, faster model. The triage agent performs classification, not complex reasoning. A model like GPT-4.1-mini or Claude 3.5 Haiku is faster and cheaper while being equally accurate for intent classification.
Explicit routing rules. Do not rely on the LLM to infer routing from general knowledge. Provide a clear decision tree in the system prompt. This makes routing deterministic and auditable.
Single clarifying question limit. If the triage agent cannot classify after one clarification, it should route to a general specialist rather than entering an interrogation loop.
The Specialist Agent Pattern
Specialist agents are domain experts. Each specialist has a focused system prompt, a curated set of tools, and clear boundaries defining what it can and cannot do.
from agents import Agent, function_tool
@function_tool
async def lookup_invoice(invoice_id: str) -> dict:
"""Look up an invoice by ID and return its details."""
# In production, this queries your billing database
return {
"invoice_id": invoice_id,
"amount": 299.00,
"status": "paid",
"date": "2026-03-15",
}
@function_tool
async def process_refund(invoice_id: str, reason: str) -> dict:
"""Process a refund for a given invoice."""
return {
"invoice_id": invoice_id,
"refund_status": "initiated",
"estimated_days": 5,
}
@function_tool
async def update_payment_method(customer_id: str,
method_type: str) -> dict:
"""Update the payment method on file for a customer."""
return {
"customer_id": customer_id,
"new_method": method_type,
"status": "updated",
}
billing_agent = Agent(
name="Billing Specialist",
instructions="""You are a billing specialist. You have access to
invoice lookup, refund processing, and payment method updates.
Rules:
1. Always verify the customer identity before any action
2. For refunds over $500, you MUST escalate to supervisor
3. Never reveal internal invoice IDs to customers
4. Log every action taken for audit trail
""",
tools=[lookup_invoice, process_refund, update_payment_method],
model="gpt-4.1",
)
Specialist Design Principles
Minimal tool surface. Each specialist should have only the tools it needs. A billing agent should not have access to the deployment API. This limits blast radius if the agent is compromised or hallucinating.
Clear escalation boundaries. Define explicit thresholds for escalation: dollar amounts, risk levels, or confidence scores. These should be in the system prompt, not buried in tool logic.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Stateful context passing. When a specialist receives a handoff from the triage agent, it gets the full conversation history. Use the context to avoid asking the customer to repeat information.
The Supervisor Agent Pattern
The supervisor agent is the most underappreciated component of hierarchical systems. While triage and specialist agents handle the happy path, the supervisor handles everything that goes wrong.
from agents import Agent, function_tool, handoff
@function_tool
async def get_agent_metrics(agent_name: str) -> dict:
"""Get current performance metrics for a specialist agent."""
# In production, pull from your observability system
return {
"agent_name": agent_name,
"active_conversations": 12,
"avg_resolution_time_seconds": 145,
"error_rate_percent": 2.3,
"escalation_rate_percent": 8.1,
}
@function_tool
async def override_agent_decision(conversation_id: str,
new_action: str,
reason: str) -> dict:
"""Override a specialist agent's decision with justification."""
return {
"conversation_id": conversation_id,
"override_applied": True,
"action": new_action,
"reason": reason,
}
@function_tool
async def route_to_human(conversation_id: str,
urgency: str,
summary: str) -> dict:
"""Escalate a conversation to a human operator."""
return {
"conversation_id": conversation_id,
"routed_to": "human_queue",
"urgency": urgency,
"position_in_queue": 3,
}
supervisor_agent = Agent(
name="Supervisor",
instructions="""You are the supervisor overseeing all specialist
agents. You are invoked when:
1. A specialist escalates (refund > $500, production outage)
2. A customer requests a supervisor
3. A specialist's confidence drops below threshold
4. An anomaly is detected in agent metrics
Your priorities:
- Customer safety and satisfaction
- Compliance with company policies
- Minimizing unnecessary human escalations
- Providing coaching feedback to specialists
You can override specialist decisions, route to humans,
or resolve the issue yourself.
""",
tools=[get_agent_metrics, override_agent_decision, route_to_human],
model="gpt-4.1",
)
Supervisor Responsibilities
Escalation handling. When a specialist hits a boundary it cannot cross (high-value refund, production incident), the supervisor evaluates the full context and either approves the action, modifies it, or routes to a human.
Quality monitoring. The supervisor periodically reviews specialist outputs for accuracy, policy compliance, and tone. This can be done asynchronously — sampling completed conversations and flagging issues.
Circuit breaking. If a specialist's error rate spikes, the supervisor can temporarily disable it and reroute traffic to a fallback agent or human queue.
Putting It All Together
The full hierarchical architecture wires triage, specialists, and supervisor into a single coherent system.
from agents import Agent, handoff, Runner
# Wire supervisor as escalation target for all specialists
billing_agent_with_escalation = Agent(
name="Billing Specialist",
instructions="...(billing instructions)...",
tools=[lookup_invoice, process_refund, update_payment_method],
handoffs=[handoff(supervisor_agent)],
model="gpt-4.1",
)
technical_agent_with_escalation = Agent(
name="Technical Specialist",
instructions="...(technical instructions)...",
handoffs=[handoff(supervisor_agent)],
model="gpt-4.1",
)
# Triage routes to specialists
triage = Agent(
name="Triage",
instructions="...(triage routing rules)...",
handoffs=[
handoff(billing_agent_with_escalation),
handoff(technical_agent_with_escalation),
handoff(sales_agent),
],
model="gpt-4.1-mini",
)
# Run the system
async def handle_customer(message: str):
result = await Runner.run(triage, message)
return result.final_output
The key insight is that handoffs are unidirectional and scoped. The triage agent hands off to specialists. Specialists hand off to the supervisor. The supervisor never hands back to the triage agent — it resolves the issue or routes to a human. This prevents circular delegation loops.
Anti-Patterns to Avoid
Recursive escalation. If a supervisor can escalate back to a specialist which escalates back to the supervisor, you have an infinite loop. Always enforce a directed acyclic graph in your handoff topology.
Overloaded triage. A triage agent with 30+ routing options becomes unreliable. If you have that many specialists, add a second triage layer — a meta-triage that routes to domain-specific triage agents.
Silent failures. Every handoff should be logged with the source agent, target agent, reason, and conversation context. Without this, debugging production issues becomes impossible.
FAQ
How do you test hierarchical agent systems?
Test each layer independently first. Unit test triage routing with a suite of example messages and expected specialist assignments. Test specialists with known scenarios and expected tool calls. Test supervisor escalation logic with synthetic escalation events. Then run integration tests that exercise full paths from triage through specialist through supervisor.
What happens when a specialist is down or overloaded?
The triage layer should check agent availability before handoff. If the target specialist is unavailable, the triage agent either routes to a backup specialist with overlapping capabilities, queues the request, or hands off to the supervisor. Never let a handoff fail silently.
Should the supervisor use a more powerful model than specialists?
Generally yes. The supervisor handles edge cases, ambiguous situations, and high-stakes decisions that benefit from stronger reasoning. Using a frontier model for the supervisor while running specialists on efficient models is a common and cost-effective pattern.
How many specialists should one triage agent manage?
Keep it under 8-10 for a single triage agent. Beyond that, classification accuracy drops. Introduce a hierarchical triage structure: a top-level triage routes to category triage agents (support-triage, sales-triage, operations-triage), each of which routes to 5-8 specialists within their domain.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.