Building Multi-Agent Voice Systems with the OpenAI Agents SDK

Why one agent is not enough

A single agent with fifty tools and a thousand-line system prompt will work — badly. It will hallucinate tool names, forget constraints, and generally underperform a smaller agent focused on one job. Multi-agent systems split the problem: a triage agent that identifies intent, specialist agents that handle each intent deeply, and handoffs that move the conversation between them without losing context.

This post walks through building a multi-agent voice system with the OpenAI Agents SDK, the same pattern CallSphere uses across its real estate, healthcare, and sales verticals.

caller → triage_agent
              │
              ├── buyer_intent ───► buyer_specialist
              ├── seller_intent ──► seller_specialist
              ├── rental_intent ──► rental_specialist
              └── tour_intent ────► tour_coordinator

Architecture overview

┌───────────────────────────────────────┐
│          Session state (shared)      │
│  • caller info                        │
│  • conversation history               │
│  • collected fields                   │
└──────────────┬────────────────────────┘
               │
               ▼
┌───────────────────────────────────────┐
│ Triage agent (thin, routing only)     │
└──────────────┬────────────────────────┘
               │ handoff
    ┌──────────┼──────────┐
    ▼          ▼          ▼
┌───────┐  ┌───────┐  ┌───────┐
│buyer  │  │seller │  │rental │
│agent  │  │agent  │  │agent  │
└───┬───┘  └───┬───┘  └───┬───┘
    │          │          │
    ▼          ▼          ▼
   tools      tools      tools

Prerequisites

Python 3.11+ and the openai-agents package.
An OpenAI key with Realtime + Agents SDK access.
Per-agent tool definitions.

Step-by-step walkthrough

1. Define the triage agent

from agents import Agent, Runner, handoff

buyer_agent = Agent(
    name="Buyer Specialist",
    instructions="You help home buyers. Ask qualifying questions, check availability, and book tours.",
    tools=[search_listings, book_tour],
)

seller_agent = Agent(
    name="Seller Specialist",
    instructions="You help home sellers. Collect property details and schedule valuation calls.",
    tools=[create_valuation_lead],
)

rental_agent = Agent(
    name="Rental Specialist",
    instructions="You help rental inquiries. Collect preferences and schedule showings.",
    tools=[search_rentals, book_showing],
)

triage = Agent(
    name="Triage",
    instructions=(
        "Greet the caller and identify whether they are buying, selling, or renting. "
        "Hand off to the correct specialist as soon as you know."
    ),
    handoffs=[handoff(buyer_agent), handoff(seller_agent), handoff(rental_agent)],
)

from agents import RunContext

class SessionState:
    def __init__(self, call_id: str, caller_phone: str):
        self.call_id = call_id
        self.caller_phone = caller_phone
        self.collected = {}

3. Run the loop

async def run_call(call_id: str, caller_phone: str, user_turns: list[str]):
    state = SessionState(call_id, caller_phone)
    messages = []
    for user_text in user_turns:
        messages.append({"role": "user", "content": user_text})
        result = await Runner.run(triage, input=messages, context=state)
        messages.append({"role": "assistant", "content": result.final_output})

4. Handle handoffs cleanly

The SDK emits a HandoffEvent when one agent transfers to another. Use it to log the handoff and keep the shared state consistent.

from agents import HandoffEvent

async def observe(result):
    for event in result.events:
        if isinstance(event, HandoffEvent):
            await log_handoff(event.from_agent, event.to_agent, event.reason)

5. Bridge to the Realtime API

Route the user's audio-derived transcripts into the Runner and pipe the final_output back to the TTS side of the Realtime session. Keep one agent-SDK context per call.

6. Guardrails per agent

Each specialist gets its own constraints: the buyer agent cannot book valuations, the seller agent cannot search listings. This prevents the combined prompt bloat that kills single-agent systems.

Production considerations

State scope: shared session state is fine; shared mutable global state is not.
Handoff loops: add a max-handoff counter; the SDK can recover from loops but it is expensive.
Tool permissions: agents only see the tools they need.
Telemetry: record which agent handled each turn for post-call analytics.
Handoff summaries: the outgoing agent should summarize what it learned so the incoming agent does not re-ask.

CallSphere's real implementation

CallSphere uses the OpenAI Agents SDK for every multi-agent vertical. Real estate runs 10 agents (triage, buyer, seller, rental, tour coordinator, qualification, finance, showing, negotiation, handoff-to-human). Healthcare combines 14 tools behind a lighter triage/specialist split. Salon runs 4 agents (receptionist, booking, upsell, recovery). After-hours escalation has 7 tools around an urgency-classifier triage. IT helpdesk pairs 10 tools with RAG behind a triage agent. The sales pod uses 5 GPT-4 specialists plus ElevenLabs TTS.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

The voice plane under all of them is the OpenAI Realtime API with gpt-4o-realtime-preview-2025-06-03, PCM16 at 24kHz, and server VAD. Handoffs happen inside a single Realtime session so there is no audio drop between agents. A GPT-4o-mini post-call pipeline writes per-agent metrics so customers can see which specialist is closing and which is leaking. CallSphere supports 57+ languages with sub-second end-to-end latency.

Common pitfalls

Too many agents: 3-10 is a sweet spot; 20 is usually over-decomposed.
Specialists that re-ask basics: use handoff summaries.
Shared tools across specialists: defeats the point of role separation.
Handoff loops: cap the count and escalate on loop.
Ignoring per-agent evals: regressions hide in aggregate metrics.

FAQ

Can I use this without the Realtime API?

Yes. The Agents SDK is transport-agnostic; Realtime is just one front-end.

How do I A/B test a single agent in a multi-agent graph?

Version the agent separately and route X% of triage handoffs to the new version.

What is a reasonable number of tools per specialist?

3-10. Past 15 the model starts confusing tool signatures.

How do I handle human escalation?

Add a transfer_to_human tool on every specialist and a dedicated escalation agent.

Does handoff cost extra tokens?

Yes, but less than the equivalent monolithic prompt.

Next steps

Want to see a 10-agent real-estate stack running live? Book a demo, read the technology page, or see pricing.

#CallSphere #OpenAIAgentsSDK #MultiAgent #VoiceAI #Orchestration #Handoffs #AIVoiceAgents

Building Multi-Agent Voice Systems with the OpenAI Agents SDK

Why one agent is not enough

Architecture overview

Prerequisites

Step-by-step walkthrough

1. Define the triage agent

3. Run the loop

4. Handle handoffs cleanly

5. Bridge to the Realtime API

6. Guardrails per agent

Production considerations

CallSphere's real implementation

Common pitfalls

FAQ

Can I use this without the Realtime API?

How do I A/B test a single agent in a multi-agent graph?

What is a reasonable number of tools per specialist?

How do I handle human escalation?

Does handoff cost extra tokens?

Next steps

Try CallSphere AI Voice Agents

Related Articles

How AI Voice Agents Actually Work: Technical Deep Dive (2026 Edition)

SIP Trunking for AI Voice Agents: Carrier Selection and Architecture

Building Voice Agents with the OpenAI Realtime API: Full Tutorial

Why one agent is not enough

Architecture overview

Prerequisites

Step-by-step walkthrough

1. Define the triage agent

2. Share session state

3. Run the loop

4. Handle handoffs cleanly

5. Bridge to the Realtime API

6. Guardrails per agent

Production considerations

CallSphere's real implementation

Common pitfalls

FAQ

Can I use this without the Realtime API?

How do I A/B test a single agent in a multi-agent graph?

What is a reasonable number of tools per specialist?

How do I handle human escalation?

Does handoff cost extra tokens?

Next steps

Try CallSphere AI Voice Agents

Related Articles

How AI Voice Agents Actually Work: Technical Deep Dive (2026 Edition)

SIP Trunking for AI Voice Agents: Carrier Selection and Architecture

Building Voice Agents with the OpenAI Realtime API: Full Tutorial