Building Multi-Agent Systems from Scratch with Python in 2026

Why Build Multi-Agent Systems?

Single-agent architectures hit a ceiling quickly. When one agent handles customer support, payment processing, order tracking, and technical troubleshooting all at once, its system prompt becomes unwieldy, its tool list grows unmanageable, and its decision-making quality degrades. Multi-agent systems solve this by letting specialized agents collaborate — each with focused instructions, targeted tools, and deep expertise in one domain.

In this tutorial, you will build a multi-agent system from scratch in Python. By the end, you will have a working system with a triage agent that routes conversations to specialist agents, complete with message passing, tool integration, and agent handoffs.

Architecture Overview

The system we are building has three agents:

Triage Agent: Receives all incoming messages, classifies intent, and routes to the appropriate specialist
Order Agent: Handles order lookups, status updates, and cancellations
Technical Support Agent: Handles technical issues, troubleshooting steps, and bug reports

Each agent has its own system prompt, tools, and handoff capabilities. The triage agent can hand off to either specialist, and specialists can hand back to triage if the conversation shifts topics.

Step 1: Define the Agent Base Class

Every agent in our system shares common behavior. Let us define a base class:

from dataclasses import dataclass, field
from typing import Callable, Any
import json

@dataclass
class Tool:
    """A tool that an agent can call."""
    name: str
    description: str
    parameters: dict
    function: Callable[..., str]

@dataclass
class AgentResponse:
    """The result of an agent processing a message."""
    message: str
    handoff_to: str | None = None
    tool_calls: list[dict] = field(default_factory=list)

@dataclass
class Agent:
    """Base class for all agents in the system."""
    name: str
    instructions: str
    tools: list[Tool] = field(default_factory=list)
    handoff_targets: list[str] = field(default_factory=list)

    def get_system_prompt(self) -> str:
        tool_descriptions = ""
        if self.tools:
            tool_descriptions = "\n\nAvailable tools:\n"
            for tool in self.tools:
                tool_descriptions += (
                    f"- {tool.name}: {tool.description}\n"
                )

        handoff_info = ""
        if self.handoff_targets:
            targets = ", ".join(self.handoff_targets)
            handoff_info = (
                f"\n\nYou can hand off to: {targets}. "
                "Use a handoff when the user's request falls "
                "outside your expertise."
            )

        return (
            self.instructions
            + tool_descriptions
            + handoff_info
        )

This base class gives each agent a name, instructions, a set of tools, and a list of agents it can hand off to. The get_system_prompt method dynamically builds the full prompt including tool descriptions and handoff capabilities.

Step 2: Implement Message Passing

Agents communicate through a shared conversation history. Each message has a role (system, user, assistant, or tool) and content:

from dataclasses import dataclass
from enum import Enum

class Role(Enum):
    SYSTEM = "system"
    USER = "user"
    ASSISTANT = "assistant"
    TOOL = "tool"

@dataclass
class Message:
    role: Role
    content: str
    agent_name: str | None = None
    tool_call_id: str | None = None

class ConversationHistory:
    """Manages the shared conversation history."""

    def __init__(self):
        self.messages: list[Message] = []

    def add_message(
        self, role: Role, content: str,
        agent_name: str | None = None
    ):
        self.messages.append(
            Message(role=role, content=content,
                    agent_name=agent_name)
        )

    def get_messages_for_agent(
        self, agent: Agent
    ) -> list[dict]:
        """Format messages for LLM API call."""
        formatted = [
            {
                "role": "system",
                "content": agent.get_system_prompt()
            }
        ]
        for msg in self.messages:
            formatted.append({
                "role": msg.role.value,
                "content": msg.content,
            })
        return formatted

    def get_summary(self) -> str:
        """Return a brief summary for context transfer."""
        recent = self.messages[-5:]
        lines = []
        for m in recent:
            lines.append(f"{m.role.value}: {m.content[:200]}")
        return "\n".join(lines)

The conversation history is shared across agents during a session. When a handoff occurs, the new agent receives the full conversation context so it can continue seamlessly.

Step 3: Build the Tools

Now let us create the tools our specialist agents will use. In a real system, these would call databases and external APIs. For this tutorial, we use mock implementations:

# ── Order tools ──

def lookup_order(order_id: str) -> str:
    """Look up an order by its ID and return status."""
    # In production, this queries your database
    mock_orders = {
        "ORD-001": {
            "status": "shipped",
            "tracking": "1Z999AA10123456784",
            "eta": "2026-03-18",
        },
        "ORD-002": {
            "status": "processing",
            "tracking": None,
            "eta": "2026-03-20",
        },
    }
    order = mock_orders.get(order_id)
    if order:
        return json.dumps(order)
    return json.dumps({"error": f"Order {order_id} not found"})

def cancel_order(order_id: str, reason: str) -> str:
    """Cancel an order. Only works for orders not yet shipped."""
    return json.dumps({
        "success": True,
        "message": f"Order {order_id} cancelled. Reason: {reason}"
    })

order_tools = [
    Tool(
        name="lookup_order",
        description="Look up order status by order ID",
        parameters={
            "order_id": {"type": "string", "required": True}
        },
        function=lookup_order,
    ),
    Tool(
        name="cancel_order",
        description="Cancel an order that has not shipped yet",
        parameters={
            "order_id": {"type": "string", "required": True},
            "reason": {"type": "string", "required": True},
        },
        function=cancel_order,
    ),
]

# ── Technical support tools ──

def search_knowledge_base(query: str) -> str:
    """Search the technical knowledge base."""
    articles = {
        "login": "Reset password at /forgot-password. "
                 "Clear browser cache if issues persist.",
        "crash": "Update to latest version. If the issue "
                 "continues, collect logs from /settings/debug.",
        "slow": "Check internet connection. Disable browser "
                "extensions. Try incognito mode.",
    }
    for key, article in articles.items():
        if key in query.lower():
            return json.dumps({"result": article})
    return json.dumps({
        "result": "No matching articles found. "
                  "Please escalate to engineering."
    })

def create_bug_report(
    title: str, description: str, severity: str
) -> str:
    """Create a bug report in the issue tracker."""
    return json.dumps({
        "ticket_id": "BUG-4521",
        "status": "created",
        "title": title,
        "severity": severity,
    })

tech_tools = [
    Tool(
        name="search_knowledge_base",
        description="Search technical docs for solutions",
        parameters={
            "query": {"type": "string", "required": True}
        },
        function=search_knowledge_base,
    ),
    Tool(
        name="create_bug_report",
        description="File a bug report for engineering",
        parameters={
            "title": {"type": "string", "required": True},
            "description": {
                "type": "string", "required": True
            },
            "severity": {
                "type": "string",
                "enum": ["low", "medium", "high", "critical"],
            },
        },
        function=create_bug_report,
    ),
]

Notice that each tool function returns a JSON string. This makes it easy for the LLM to parse results and incorporate them into its responses.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Step 4: Create the Agents

With tools defined, we can now instantiate our three agents:

triage_agent = Agent(
    name="Triage Agent",
    instructions="""You are a triage agent. Your job is to
understand the user's request and route them to the right
specialist agent.

- For order-related questions (tracking, status, cancellations,
  returns): hand off to Order Agent
- For technical issues (bugs, errors, performance, login
  problems): hand off to Technical Support Agent
- For general questions: answer directly

Always greet the user and ask clarifying questions if the
intent is ambiguous.""",
    handoff_targets=["Order Agent", "Technical Support Agent"],
)

order_agent = Agent(
    name="Order Agent",
    instructions="""You are an order management specialist.
Help users with order tracking, status updates, and
cancellations.

Always look up the order before providing information.
For cancellations, confirm with the user before proceeding.
If the request is not order-related, hand off to the
Triage Agent.""",
    tools=order_tools,
    handoff_targets=["Triage Agent"],
)

tech_agent = Agent(
    name="Technical Support Agent",
    instructions="""You are a technical support specialist.
Help users troubleshoot technical issues, find solutions in
the knowledge base, and file bug reports when needed.

Always search the knowledge base first. Only create a bug
report if the knowledge base does not have a solution.
If the request is not technical, hand off to the
Triage Agent.""",
    tools=tech_tools,
    handoff_targets=["Triage Agent"],
)

Step 5: Build the Orchestrator

The orchestrator is the brain of the multi-agent system. It manages which agent is currently active, processes LLM responses, executes tool calls, and handles handoffs:

from openai import OpenAI

class Orchestrator:
    """Manages agent execution and handoffs."""

    def __init__(self, agents: dict[str, Agent]):
        self.agents = agents
        self.active_agent: Agent | None = None
        self.history = ConversationHistory()
        self.client = OpenAI()

    def set_active_agent(self, agent_name: str):
        self.active_agent = self.agents[agent_name]

    def process_message(self, user_input: str) -> str:
        self.history.add_message(Role.USER, user_input)

        max_iterations = 10
        for _ in range(max_iterations):
            messages = self.history.get_messages_for_agent(
                self.active_agent
            )

            response = self.client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                temperature=0.3,
            )

            assistant_msg = response.choices[0].message
            content = assistant_msg.content or ""

            # Check for handoff
            handoff = self._check_handoff(content)
            if handoff:
                self.set_active_agent(handoff)
                self.history.add_message(
                    Role.ASSISTANT,
                    f"[Handing off to {handoff}]",
                    agent_name=self.active_agent.name,
                )
                continue

            # Check for tool calls
            tool_result = self._execute_tools(content)
            if tool_result:
                self.history.add_message(
                    Role.TOOL, tool_result
                )
                continue

            # Regular response
            self.history.add_message(
                Role.ASSISTANT, content,
                agent_name=self.active_agent.name,
            )
            return content

        return "I apologize, I was unable to process that."

    def _check_handoff(self, content: str) -> str | None:
        for target in self.active_agent.handoff_targets:
            if f"handoff to {target.lower()}" in content.lower():
                return target
        return None

    def _execute_tools(self, content: str) -> str | None:
        for tool in self.active_agent.tools:
            if f"use tool: {tool.name}" in content.lower():
                # Parse arguments from content
                result = tool.function()
                return result
        return None

At CallSphere, we deploy multi-agent systems across 6 verticals using an orchestration layer similar to this, scaled to handle thousands of concurrent sessions with persistent conversation state backed by PostgreSQL.

Step 6: Run the System

Bring everything together and run an interactive session:

def main():
    agents = {
        "Triage Agent": triage_agent,
        "Order Agent": order_agent,
        "Technical Support Agent": tech_agent,
    }

    orchestrator = Orchestrator(agents)
    orchestrator.set_active_agent("Triage Agent")

    print("Multi-Agent System Ready. Type 'quit' to exit.")
    print(f"Active agent: {orchestrator.active_agent.name}")
    print("-" * 50)

    while True:
        user_input = input("You: ")
        if user_input.lower() == "quit":
            break

        response = orchestrator.process_message(user_input)
        agent_name = orchestrator.active_agent.name
        print(f"[{agent_name}]: {response}")
        print("-" * 50)

if __name__ == "__main__":
    main()

Step 7: Production Considerations

The system above demonstrates the core patterns, but production multi-agent systems need additional layers:

Persistent State

Store conversation history in a database rather than in memory. Use PostgreSQL with a conversations table that tracks the active agent, message history, and metadata:

CREATE TABLE conversations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    active_agent VARCHAR(100) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE messages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    conversation_id UUID REFERENCES conversations(id),
    role VARCHAR(20) NOT NULL,
    content TEXT NOT NULL,
    agent_name VARCHAR(100),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Observability

Log every agent decision, tool call, and handoff. Structure your logs so you can trace a single conversation across all agents:

import structlog

logger = structlog.get_logger()

def log_agent_action(
    conversation_id: str,
    agent_name: str,
    action: str,
    details: dict
):
    logger.info(
        "agent_action",
        conversation_id=conversation_id,
        agent=agent_name,
        action=action,
        **details,
    )

Error Boundaries

Wrap each agent's execution in error handling that prevents one agent's failure from crashing the entire system. If a specialist agent fails, fall back to the triage agent with an appropriate error message.

Concurrency

Use async patterns throughout. Replace synchronous OpenAI calls with async equivalents and use asyncio for concurrent tool execution when an agent calls multiple tools simultaneously.

Frequently Asked Questions

How many agents should a multi-agent system have?

Start with the minimum number of agents needed to cover your use cases — usually 3 to 5. Each additional agent adds handoff complexity, increases the chance of routing errors, and makes debugging harder. A common pattern is one triage agent plus 2-4 specialist agents. Only add new agents when you have data showing that an existing agent is struggling to handle a particular domain well.

The most reliable method is passing the full conversation history to the receiving agent. This ensures the new agent has complete context without the user needing to repeat themselves. For very long conversations, you can use a summarization step — have the handing-off agent generate a brief summary of the key information, then include that summary in the system prompt of the receiving agent alongside the recent messages.

What happens when two agents disagree or create a loop?

Agent loops (A hands off to B, B hands off back to A repeatedly) are a common failure mode. Prevent them by implementing a maximum handoff count per conversation turn (typically 2-3), tracking handoff history to detect cycles, and having a fallback behavior that asks the user for clarification instead of handing off again. In the orchestrator, increment a counter on each handoff and break the loop if it exceeds the threshold.

Can I use different LLM models for different agents?

Yes, and you should consider it for cost optimization. A triage agent that only classifies intent can run on a smaller, cheaper model like GPT-4o-mini, while a specialist agent handling complex reasoning might need GPT-4o or Claude 3.5 Sonnet. At CallSphere, we routinely use mixed-model architectures where the triage layer costs one-tenth of the specialist layer per request.

How do I test a multi-agent system?

Test at three levels. First, test each agent in isolation with mocked tools and pre-scripted conversation flows. Second, test the handoff logic by verifying that the orchestrator routes messages to the correct agent for a set of representative inputs. Third, run end-to-end tests with realistic multi-turn conversations that exercise the full flow — including handoffs, tool calls, and error scenarios.

Building Multi-Agent Systems from Scratch with Python in 2026

Why Build Multi-Agent Systems?

Architecture Overview

Step 1: Define the Agent Base Class

Step 2: Implement Message Passing

Step 3: Build the Tools

Step 4: Create the Agents

Step 5: Build the Orchestrator

Step 6: Run the System

Step 7: Production Considerations

Persistent State

Observability

Error Boundaries

Concurrency

Frequently Asked Questions

How many agents should a multi-agent system have?

What happens when two agents disagree or create a loop?

Can I use different LLM models for different agents?

How do I test a multi-agent system?

Try CallSphere AI Voice Agents

Related Articles

Agentic AI Structured Outputs: JSON Schema Enforcement and Type-Safe Patterns

Building Agentic AI with Streaming: Real-Time Token-by-Token Output Patterns

Building Agentic AI Tool Libraries: A Developer's Guide to Custom Functions

Why Build Multi-Agent Systems?

Architecture Overview

Step 1: Define the Agent Base Class

Step 2: Implement Message Passing

Step 3: Build the Tools

Step 4: Create the Agents

Step 5: Build the Orchestrator

Step 6: Run the System

Step 7: Production Considerations

Persistent State

Observability

Error Boundaries

Concurrency

Frequently Asked Questions

How many agents should a multi-agent system have?

How do agents share context during handoffs?

What happens when two agents disagree or create a loop?

Can I use different LLM models for different agents?

How do I test a multi-agent system?

Try CallSphere AI Voice Agents

Related Articles

Agentic AI Structured Outputs: JSON Schema Enforcement and Type-Safe Patterns

Building Agentic AI with Streaming: Real-Time Token-by-Token Output Patterns

Building Agentic AI Tool Libraries: A Developer's Guide to Custom Functions