Building Multi-Agent Systems from Scratch with Python in 2026
Hands-on tutorial: build a multi-agent system in Python with agent base classes, message passing, tool integration, and handoffs. Complete code examples.
Why Build Multi-Agent Systems?
Single-agent architectures hit a ceiling quickly. When one agent handles customer support, payment processing, order tracking, and technical troubleshooting all at once, its system prompt becomes unwieldy, its tool list grows unmanageable, and its decision-making quality degrades. Multi-agent systems solve this by letting specialized agents collaborate — each with focused instructions, targeted tools, and deep expertise in one domain.
In this tutorial, you will build a multi-agent system from scratch in Python. By the end, you will have a working system with a triage agent that routes conversations to specialist agents, complete with message passing, tool integration, and agent handoffs.
Architecture Overview
The system we are building has three agents:
- Triage Agent: Receives all incoming messages, classifies intent, and routes to the appropriate specialist
- Order Agent: Handles order lookups, status updates, and cancellations
- Technical Support Agent: Handles technical issues, troubleshooting steps, and bug reports
Each agent has its own system prompt, tools, and handoff capabilities. The triage agent can hand off to either specialist, and specialists can hand back to triage if the conversation shifts topics.
Step 1: Define the Agent Base Class
Every agent in our system shares common behavior. Let us define a base class:
from dataclasses import dataclass, field
from typing import Callable, Any
import json
@dataclass
class Tool:
"""A tool that an agent can call."""
name: str
description: str
parameters: dict
function: Callable[..., str]
@dataclass
class AgentResponse:
"""The result of an agent processing a message."""
message: str
handoff_to: str | None = None
tool_calls: list[dict] = field(default_factory=list)
@dataclass
class Agent:
"""Base class for all agents in the system."""
name: str
instructions: str
tools: list[Tool] = field(default_factory=list)
handoff_targets: list[str] = field(default_factory=list)
def get_system_prompt(self) -> str:
tool_descriptions = ""
if self.tools:
tool_descriptions = "\n\nAvailable tools:\n"
for tool in self.tools:
tool_descriptions += (
f"- {tool.name}: {tool.description}\n"
)
handoff_info = ""
if self.handoff_targets:
targets = ", ".join(self.handoff_targets)
handoff_info = (
f"\n\nYou can hand off to: {targets}. "
"Use a handoff when the user's request falls "
"outside your expertise."
)
return (
self.instructions
+ tool_descriptions
+ handoff_info
)
This base class gives each agent a name, instructions, a set of tools, and a list of agents it can hand off to. The get_system_prompt method dynamically builds the full prompt including tool descriptions and handoff capabilities.
Step 2: Implement Message Passing
Agents communicate through a shared conversation history. Each message has a role (system, user, assistant, or tool) and content:
from dataclasses import dataclass
from enum import Enum
class Role(Enum):
SYSTEM = "system"
USER = "user"
ASSISTANT = "assistant"
TOOL = "tool"
@dataclass
class Message:
role: Role
content: str
agent_name: str | None = None
tool_call_id: str | None = None
class ConversationHistory:
"""Manages the shared conversation history."""
def __init__(self):
self.messages: list[Message] = []
def add_message(
self, role: Role, content: str,
agent_name: str | None = None
):
self.messages.append(
Message(role=role, content=content,
agent_name=agent_name)
)
def get_messages_for_agent(
self, agent: Agent
) -> list[dict]:
"""Format messages for LLM API call."""
formatted = [
{
"role": "system",
"content": agent.get_system_prompt()
}
]
for msg in self.messages:
formatted.append({
"role": msg.role.value,
"content": msg.content,
})
return formatted
def get_summary(self) -> str:
"""Return a brief summary for context transfer."""
recent = self.messages[-5:]
lines = []
for m in recent:
lines.append(f"{m.role.value}: {m.content[:200]}")
return "\n".join(lines)
The conversation history is shared across agents during a session. When a handoff occurs, the new agent receives the full conversation context so it can continue seamlessly.
Step 3: Build the Tools
Now let us create the tools our specialist agents will use. In a real system, these would call databases and external APIs. For this tutorial, we use mock implementations:
# ── Order tools ──
def lookup_order(order_id: str) -> str:
"""Look up an order by its ID and return status."""
# In production, this queries your database
mock_orders = {
"ORD-001": {
"status": "shipped",
"tracking": "1Z999AA10123456784",
"eta": "2026-03-18",
},
"ORD-002": {
"status": "processing",
"tracking": None,
"eta": "2026-03-20",
},
}
order = mock_orders.get(order_id)
if order:
return json.dumps(order)
return json.dumps({"error": f"Order {order_id} not found"})
def cancel_order(order_id: str, reason: str) -> str:
"""Cancel an order. Only works for orders not yet shipped."""
return json.dumps({
"success": True,
"message": f"Order {order_id} cancelled. Reason: {reason}"
})
order_tools = [
Tool(
name="lookup_order",
description="Look up order status by order ID",
parameters={
"order_id": {"type": "string", "required": True}
},
function=lookup_order,
),
Tool(
name="cancel_order",
description="Cancel an order that has not shipped yet",
parameters={
"order_id": {"type": "string", "required": True},
"reason": {"type": "string", "required": True},
},
function=cancel_order,
),
]
# ── Technical support tools ──
def search_knowledge_base(query: str) -> str:
"""Search the technical knowledge base."""
articles = {
"login": "Reset password at /forgot-password. "
"Clear browser cache if issues persist.",
"crash": "Update to latest version. If the issue "
"continues, collect logs from /settings/debug.",
"slow": "Check internet connection. Disable browser "
"extensions. Try incognito mode.",
}
for key, article in articles.items():
if key in query.lower():
return json.dumps({"result": article})
return json.dumps({
"result": "No matching articles found. "
"Please escalate to engineering."
})
def create_bug_report(
title: str, description: str, severity: str
) -> str:
"""Create a bug report in the issue tracker."""
return json.dumps({
"ticket_id": "BUG-4521",
"status": "created",
"title": title,
"severity": severity,
})
tech_tools = [
Tool(
name="search_knowledge_base",
description="Search technical docs for solutions",
parameters={
"query": {"type": "string", "required": True}
},
function=search_knowledge_base,
),
Tool(
name="create_bug_report",
description="File a bug report for engineering",
parameters={
"title": {"type": "string", "required": True},
"description": {
"type": "string", "required": True
},
"severity": {
"type": "string",
"enum": ["low", "medium", "high", "critical"],
},
},
function=create_bug_report,
),
]
Notice that each tool function returns a JSON string. This makes it easy for the LLM to parse results and incorporate them into its responses.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Step 4: Create the Agents
With tools defined, we can now instantiate our three agents:
triage_agent = Agent(
name="Triage Agent",
instructions="""You are a triage agent. Your job is to
understand the user's request and route them to the right
specialist agent.
- For order-related questions (tracking, status, cancellations,
returns): hand off to Order Agent
- For technical issues (bugs, errors, performance, login
problems): hand off to Technical Support Agent
- For general questions: answer directly
Always greet the user and ask clarifying questions if the
intent is ambiguous.""",
handoff_targets=["Order Agent", "Technical Support Agent"],
)
order_agent = Agent(
name="Order Agent",
instructions="""You are an order management specialist.
Help users with order tracking, status updates, and
cancellations.
Always look up the order before providing information.
For cancellations, confirm with the user before proceeding.
If the request is not order-related, hand off to the
Triage Agent.""",
tools=order_tools,
handoff_targets=["Triage Agent"],
)
tech_agent = Agent(
name="Technical Support Agent",
instructions="""You are a technical support specialist.
Help users troubleshoot technical issues, find solutions in
the knowledge base, and file bug reports when needed.
Always search the knowledge base first. Only create a bug
report if the knowledge base does not have a solution.
If the request is not technical, hand off to the
Triage Agent.""",
tools=tech_tools,
handoff_targets=["Triage Agent"],
)
Step 5: Build the Orchestrator
The orchestrator is the brain of the multi-agent system. It manages which agent is currently active, processes LLM responses, executes tool calls, and handles handoffs:
from openai import OpenAI
class Orchestrator:
"""Manages agent execution and handoffs."""
def __init__(self, agents: dict[str, Agent]):
self.agents = agents
self.active_agent: Agent | None = None
self.history = ConversationHistory()
self.client = OpenAI()
def set_active_agent(self, agent_name: str):
self.active_agent = self.agents[agent_name]
def process_message(self, user_input: str) -> str:
self.history.add_message(Role.USER, user_input)
max_iterations = 10
for _ in range(max_iterations):
messages = self.history.get_messages_for_agent(
self.active_agent
)
response = self.client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.3,
)
assistant_msg = response.choices[0].message
content = assistant_msg.content or ""
# Check for handoff
handoff = self._check_handoff(content)
if handoff:
self.set_active_agent(handoff)
self.history.add_message(
Role.ASSISTANT,
f"[Handing off to {handoff}]",
agent_name=self.active_agent.name,
)
continue
# Check for tool calls
tool_result = self._execute_tools(content)
if tool_result:
self.history.add_message(
Role.TOOL, tool_result
)
continue
# Regular response
self.history.add_message(
Role.ASSISTANT, content,
agent_name=self.active_agent.name,
)
return content
return "I apologize, I was unable to process that."
def _check_handoff(self, content: str) -> str | None:
for target in self.active_agent.handoff_targets:
if f"handoff to {target.lower()}" in content.lower():
return target
return None
def _execute_tools(self, content: str) -> str | None:
for tool in self.active_agent.tools:
if f"use tool: {tool.name}" in content.lower():
# Parse arguments from content
result = tool.function()
return result
return None
At CallSphere, we deploy multi-agent systems across 6 verticals using an orchestration layer similar to this, scaled to handle thousands of concurrent sessions with persistent conversation state backed by PostgreSQL.
Step 6: Run the System
Bring everything together and run an interactive session:
def main():
agents = {
"Triage Agent": triage_agent,
"Order Agent": order_agent,
"Technical Support Agent": tech_agent,
}
orchestrator = Orchestrator(agents)
orchestrator.set_active_agent("Triage Agent")
print("Multi-Agent System Ready. Type 'quit' to exit.")
print(f"Active agent: {orchestrator.active_agent.name}")
print("-" * 50)
while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break
response = orchestrator.process_message(user_input)
agent_name = orchestrator.active_agent.name
print(f"[{agent_name}]: {response}")
print("-" * 50)
if __name__ == "__main__":
main()
Step 7: Production Considerations
The system above demonstrates the core patterns, but production multi-agent systems need additional layers:
Persistent State
Store conversation history in a database rather than in memory. Use PostgreSQL with a conversations table that tracks the active agent, message history, and metadata:
CREATE TABLE conversations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
active_agent VARCHAR(100) NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id UUID REFERENCES conversations(id),
role VARCHAR(20) NOT NULL,
content TEXT NOT NULL,
agent_name VARCHAR(100),
created_at TIMESTAMPTZ DEFAULT NOW()
);
Observability
Log every agent decision, tool call, and handoff. Structure your logs so you can trace a single conversation across all agents:
import structlog
logger = structlog.get_logger()
def log_agent_action(
conversation_id: str,
agent_name: str,
action: str,
details: dict
):
logger.info(
"agent_action",
conversation_id=conversation_id,
agent=agent_name,
action=action,
**details,
)
Error Boundaries
Wrap each agent's execution in error handling that prevents one agent's failure from crashing the entire system. If a specialist agent fails, fall back to the triage agent with an appropriate error message.
Concurrency
Use async patterns throughout. Replace synchronous OpenAI calls with async equivalents and use asyncio for concurrent tool execution when an agent calls multiple tools simultaneously.
Frequently Asked Questions
How many agents should a multi-agent system have?
Start with the minimum number of agents needed to cover your use cases — usually 3 to 5. Each additional agent adds handoff complexity, increases the chance of routing errors, and makes debugging harder. A common pattern is one triage agent plus 2-4 specialist agents. Only add new agents when you have data showing that an existing agent is struggling to handle a particular domain well.
How do agents share context during handoffs?
The most reliable method is passing the full conversation history to the receiving agent. This ensures the new agent has complete context without the user needing to repeat themselves. For very long conversations, you can use a summarization step — have the handing-off agent generate a brief summary of the key information, then include that summary in the system prompt of the receiving agent alongside the recent messages.
What happens when two agents disagree or create a loop?
Agent loops (A hands off to B, B hands off back to A repeatedly) are a common failure mode. Prevent them by implementing a maximum handoff count per conversation turn (typically 2-3), tracking handoff history to detect cycles, and having a fallback behavior that asks the user for clarification instead of handing off again. In the orchestrator, increment a counter on each handoff and break the loop if it exceeds the threshold.
Can I use different LLM models for different agents?
Yes, and you should consider it for cost optimization. A triage agent that only classifies intent can run on a smaller, cheaper model like GPT-4o-mini, while a specialist agent handling complex reasoning might need GPT-4o or Claude 3.5 Sonnet. At CallSphere, we routinely use mixed-model architectures where the triage layer costs one-tenth of the specialist layer per request.
How do I test a multi-agent system?
Test at three levels. First, test each agent in isolation with mocked tools and pre-scripted conversation flows. Second, test the handoff logic by verifying that the orchestrator routes messages to the correct agent for a set of representative inputs. Third, run end-to-end tests with realistic multi-turn conversations that exercise the full flow — including handoffs, tool calls, and error scenarios.
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.