Building Multi-Agent Voice Systems with the OpenAI Agents SDK
A developer guide to building multi-agent voice systems with the OpenAI Agents SDK — triage, handoffs, shared state, and tool calling.
Why one agent is not enough
A single agent with fifty tools and a thousand-line system prompt will work — badly. It will hallucinate tool names, forget constraints, and generally underperform a smaller agent focused on one job. Multi-agent systems split the problem: a triage agent that identifies intent, specialist agents that handle each intent deeply, and handoffs that move the conversation between them without losing context.
This post walks through building a multi-agent voice system with the OpenAI Agents SDK, the same pattern CallSphere uses across its real estate, healthcare, and sales verticals.
caller → triage_agent
│
├── buyer_intent ───► buyer_specialist
├── seller_intent ──► seller_specialist
├── rental_intent ──► rental_specialist
└── tour_intent ────► tour_coordinator
Architecture overview
┌───────────────────────────────────────┐
│ Session state (shared) │
│ • caller info │
│ • conversation history │
│ • collected fields │
└──────────────┬────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ Triage agent (thin, routing only) │
└──────────────┬────────────────────────┘
│ handoff
┌──────────┼──────────┐
▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐
│buyer │ │seller │ │rental │
│agent │ │agent │ │agent │
└───┬───┘ └───┬───┘ └───┬───┘
│ │ │
▼ ▼ ▼
tools tools tools
Prerequisites
- Python 3.11+ and the
openai-agentspackage. - An OpenAI key with Realtime + Agents SDK access.
- Per-agent tool definitions.
Step-by-step walkthrough
1. Define the triage agent
from agents import Agent, Runner, handoff
buyer_agent = Agent(
name="Buyer Specialist",
instructions="You help home buyers. Ask qualifying questions, check availability, and book tours.",
tools=[search_listings, book_tour],
)
seller_agent = Agent(
name="Seller Specialist",
instructions="You help home sellers. Collect property details and schedule valuation calls.",
tools=[create_valuation_lead],
)
rental_agent = Agent(
name="Rental Specialist",
instructions="You help rental inquiries. Collect preferences and schedule showings.",
tools=[search_rentals, book_showing],
)
triage = Agent(
name="Triage",
instructions=(
"Greet the caller and identify whether they are buying, selling, or renting. "
"Hand off to the correct specialist as soon as you know."
),
handoffs=[handoff(buyer_agent), handoff(seller_agent), handoff(rental_agent)],
)
2. Share session state
from agents import RunContext
class SessionState:
def __init__(self, call_id: str, caller_phone: str):
self.call_id = call_id
self.caller_phone = caller_phone
self.collected = {}
3. Run the loop
async def run_call(call_id: str, caller_phone: str, user_turns: list[str]):
state = SessionState(call_id, caller_phone)
messages = []
for user_text in user_turns:
messages.append({"role": "user", "content": user_text})
result = await Runner.run(triage, input=messages, context=state)
messages.append({"role": "assistant", "content": result.final_output})
4. Handle handoffs cleanly
The SDK emits a HandoffEvent when one agent transfers to another. Use it to log the handoff and keep the shared state consistent.
from agents import HandoffEvent
async def observe(result):
for event in result.events:
if isinstance(event, HandoffEvent):
await log_handoff(event.from_agent, event.to_agent, event.reason)
5. Bridge to the Realtime API
Route the user's audio-derived transcripts into the Runner and pipe the final_output back to the TTS side of the Realtime session. Keep one agent-SDK context per call.
6. Guardrails per agent
Each specialist gets its own constraints: the buyer agent cannot book valuations, the seller agent cannot search listings. This prevents the combined prompt bloat that kills single-agent systems.
Production considerations
- State scope: shared session state is fine; shared mutable global state is not.
- Handoff loops: add a max-handoff counter; the SDK can recover from loops but it is expensive.
- Tool permissions: agents only see the tools they need.
- Telemetry: record which agent handled each turn for post-call analytics.
- Handoff summaries: the outgoing agent should summarize what it learned so the incoming agent does not re-ask.
CallSphere's real implementation
CallSphere uses the OpenAI Agents SDK for every multi-agent vertical. Real estate runs 10 agents (triage, buyer, seller, rental, tour coordinator, qualification, finance, showing, negotiation, handoff-to-human). Healthcare combines 14 tools behind a lighter triage/specialist split. Salon runs 4 agents (receptionist, booking, upsell, recovery). After-hours escalation has 7 tools around an urgency-classifier triage. IT helpdesk pairs 10 tools with RAG behind a triage agent. The sales pod uses 5 GPT-4 specialists plus ElevenLabs TTS.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
The voice plane under all of them is the OpenAI Realtime API with gpt-4o-realtime-preview-2025-06-03, PCM16 at 24kHz, and server VAD. Handoffs happen inside a single Realtime session so there is no audio drop between agents. A GPT-4o-mini post-call pipeline writes per-agent metrics so customers can see which specialist is closing and which is leaking. CallSphere supports 57+ languages with sub-second end-to-end latency.
Common pitfalls
- Too many agents: 3-10 is a sweet spot; 20 is usually over-decomposed.
- Specialists that re-ask basics: use handoff summaries.
- Shared tools across specialists: defeats the point of role separation.
- Handoff loops: cap the count and escalate on loop.
- Ignoring per-agent evals: regressions hide in aggregate metrics.
FAQ
Can I use this without the Realtime API?
Yes. The Agents SDK is transport-agnostic; Realtime is just one front-end.
How do I A/B test a single agent in a multi-agent graph?
Version the agent separately and route X% of triage handoffs to the new version.
What is a reasonable number of tools per specialist?
3-10. Past 15 the model starts confusing tool signatures.
How do I handle human escalation?
Add a transfer_to_human tool on every specialist and a dedicated escalation agent.
Does handoff cost extra tokens?
Yes, but less than the equivalent monolithic prompt.
Next steps
Want to see a 10-agent real-estate stack running live? Book a demo, read the technology page, or see pricing.
#CallSphere #OpenAIAgentsSDK #MultiAgent #VoiceAI #Orchestration #Handoffs #AIVoiceAgents
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.