Flat vs Hierarchical vs Mesh: Choosing the Right Multi-Agent Topology

Topology Is the First Architectural Decision

Before you write a single line of agent code, you must decide how your agents relate to each other structurally. This is the topology question, and it constrains everything that follows: how agents discover each other, how work is distributed, how failures propagate, and how the system scales.

The three fundamental topologies are flat (all agents are peers), hierarchical (agents form a tree), and mesh (agents form a dynamic peer-to-peer network). Each has clear strengths and weaknesses. Choosing the wrong topology for your problem is the kind of architectural mistake that gets more expensive to fix every week it persists.

Flat Topology: All Agents Are Peers

In a flat topology, every agent can communicate directly with every other agent. There is no coordinator, no hierarchy, and no routing layer. Each agent decides independently which other agents to collaborate with.

from dataclasses import dataclass, field
import asyncio

@dataclass
class FlatAgent:
    name: str
    capabilities: list[str]
    peers: dict[str, "FlatAgent"] = field(default_factory=dict)

    def discover_peers(self, all_agents: list["FlatAgent"]):
        for agent in all_agents:
            if agent.name != self.name:
                self.peers[agent.name] = agent

    async def request_help(self, capability: str,
                           task: dict) -> dict | None:
        for peer in self.peers.values():
            if capability in peer.capabilities:
                return await peer.handle_task(task)
        return None

    async def handle_task(self, task: dict) -> dict:
        return {
            "handled_by": self.name,
            "task": task["description"],
            "status": "complete",
        }

# Setup
research_agent = FlatAgent("researcher", ["web_search", "summarize"])
writer_agent = FlatAgent("writer", ["draft_email", "edit_text"])
data_agent = FlatAgent("data", ["query_db", "generate_report"])

all_agents = [research_agent, writer_agent, data_agent]
for agent in all_agents:
    agent.discover_peers(all_agents)

When Flat Works

Flat topologies excel in small, collaborative teams of 2-5 agents where every agent may need to interact with every other agent. Think of a content creation pipeline: a research agent, a writing agent, and an editing agent. Each may ask the others for input at any point.

When Flat Breaks

The number of potential communication paths grows quadratically: N*(N-1)/2. At 5 agents, that is 10 paths. At 20 agents, it is 190. At 100 agents, it is 4,950. Testing, monitoring, and debugging become impractical.

Flat topologies also lack coordination. If two agents both try to handle the same task, you get duplicated work. If no agent claims a task, it falls through the cracks. There is no natural place to enforce global policies or observe system-wide behavior.

Complexity: O(N^2) communication paths Best for: 2-5 agents, prototyping, collaborative workflows Avoid for: Production systems above 10 agents

Hierarchical Topology: Agents Form a Tree

Hierarchical topologies organize agents into layers. A top-level coordinator (the root) manages mid-level coordinators or specialists, which may in turn manage their own sub-agents. Communication flows up and down the tree.

from dataclasses import dataclass, field
from typing import Any

@dataclass
class HierarchicalAgent:
    name: str
    role: str  # "coordinator", "specialist", "worker"
    children: list["HierarchicalAgent"] = field(default_factory=list)
    parent: "HierarchicalAgent | None" = None

    def add_child(self, child: "HierarchicalAgent"):
        child.parent = self
        self.children.append(child)

    async def delegate(self, task: dict) -> dict:
        """Coordinator delegates to the best child."""
        best_child = self._select_child(task)
        if best_child:
            return await best_child.execute(task)
        # No suitable child — escalate to parent
        if self.parent:
            return await self.parent.escalate(task)
        return {"error": "No agent can handle this task"}

    async def execute(self, task: dict) -> dict:
        if self.role == "worker":
            return await self._do_work(task)
        return await self.delegate(task)

    async def escalate(self, task: dict) -> dict:
        """Handle escalated tasks from children."""
        # Try other children first
        for child in self.children:
            if self._can_handle(child, task):
                return await child.execute(task)
        # Escalate further up
        if self.parent:
            return await self.parent.escalate(task)
        return {"status": "requires_human", "task": task}

    def _select_child(self, task: dict):
        for child in self.children:
            if self._can_handle(child, task):
                return child
        return None

    def _can_handle(self, child, task: dict) -> bool:
        return task.get("domain") == child.name

    async def _do_work(self, task: dict) -> dict:
        return {"handled_by": self.name, "status": "complete"}

# Build the tree
root = HierarchicalAgent("coordinator", "coordinator")

support = HierarchicalAgent("support", "coordinator")
sales = HierarchicalAgent("sales", "coordinator")
root.add_child(support)
root.add_child(sales)

billing_worker = HierarchicalAgent("billing", "worker")
tech_worker = HierarchicalAgent("technical", "worker")
support.add_child(billing_worker)
support.add_child(tech_worker)

pricing_worker = HierarchicalAgent("pricing", "worker")
demo_worker = HierarchicalAgent("demo", "worker")
sales.add_child(pricing_worker)
sales.add_child(demo_worker)

When Hierarchical Works

Hierarchical topologies excel at scale. They reduce communication complexity from O(N^2) to O(N) because agents only communicate with their parent and children. They provide natural escalation paths, clear authority boundaries, and straightforward observability — you can monitor each level of the tree independently.

Most enterprise multi-agent systems use hierarchical topologies because they map naturally to organizational structures and compliance requirements.

When Hierarchical Breaks

Hierarchical topologies struggle with cross-cutting concerns. If the billing worker needs data from the demo worker, the request must travel up through the support coordinator, across to the sales coordinator, and down to the demo worker. This adds latency and places unnecessary load on coordinators.

Rigid hierarchies also resist change. Adding a new capability often requires restructuring the tree.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Complexity: O(N) communication paths, O(log N) routing depth Best for: 10-500 agents, enterprise systems, compliance-heavy domains Avoid for: Highly dynamic workloads, frequent cross-domain collaboration

Mesh Topology: Dynamic Peer-to-Peer

Mesh topologies allow any agent to communicate with any other agent, like flat topologies, but add a discovery and routing layer that prevents the quadratic explosion. Agents register their capabilities with a service registry, and communication is routed dynamically based on capability matching.

from dataclasses import dataclass, field
import asyncio

@dataclass
class MeshNode:
    agent_id: str
    capabilities: set[str]
    connections: set[str] = field(default_factory=set)
    max_connections: int = 8  # Limit to prevent N^2

class MeshRegistry:
    def __init__(self):
        self.nodes: dict[str, MeshNode] = {}

    def register(self, agent_id: str, capabilities: set[str]):
        node = MeshNode(agent_id=agent_id, capabilities=capabilities)
        self.nodes[agent_id] = node
        self._optimize_connections(node)

    def _optimize_connections(self, new_node: MeshNode):
        """Connect to agents with complementary capabilities."""
        scored = []
        for existing in self.nodes.values():
            if existing.agent_id == new_node.agent_id:
                continue
            # Score based on capability overlap and complement
            overlap = len(
                new_node.capabilities & existing.capabilities
            )
            complement = len(
                existing.capabilities - new_node.capabilities
            )
            score = complement - overlap  # Prefer complementary
            scored.append((existing, score))

        scored.sort(key=lambda x: x[1], reverse=True)
        for node, _ in scored[:new_node.max_connections]:
            new_node.connections.add(node.agent_id)
            node.connections.add(new_node.agent_id)

    def find_path(self, source: str,
                  required_capability: str) -> list[str] | None:
        """BFS to find an agent with the required capability."""
        visited = set()
        queue = [(source, [source])]
        while queue:
            current, path = queue.pop(0)
            if current in visited:
                continue
            visited.add(current)
            node = self.nodes.get(current)
            if not node:
                continue
            if (required_capability in node.capabilities
                    and current != source):
                return path + [current] if current not in path else path
            for neighbor in node.connections:
                if neighbor not in visited:
                    queue.append((neighbor, path + [neighbor]))
        return None

When Mesh Works

Mesh topologies shine in dynamic environments where agent capabilities change frequently, new agents are added and removed regularly, and cross-domain collaboration is common. They combine the flexibility of flat topologies with the scalability of structured routing.

Research labs, creative collaboration platforms, and adaptive systems benefit from mesh topologies because the workflow is not predetermined — agents self-organize based on the problem.

When Mesh Breaks

Mesh topologies are the most complex to implement and operate. The routing algorithm, connection management, and consistency model all require careful engineering. Debugging is harder because communication paths are dynamic. Without careful connection limits, the mesh can degenerate into a flat topology.

Complexity: O(N * max_connections) paths, O(diameter) routing depth Best for: Dynamic workloads, research environments, adaptive systems Avoid for: Compliance-heavy domains, systems requiring strict audit trails

Decision Framework

Use this framework to select your starting topology:

Choose Flat when:

You have fewer than 6 agents
You are prototyping or in early development
Every agent genuinely needs direct access to every other agent
You can migrate to hierarchical later

Choose Hierarchical when:

You have 10+ agents or expect to grow beyond 10
Your domain has natural authority boundaries (departments, approval chains)
Compliance requires clear escalation paths and audit trails
You value operational simplicity over communication flexibility

Choose Mesh when:

Agent capabilities are dynamic and change at runtime
Workflows are emergent and not predetermined
Cross-domain collaboration is the norm, not the exception
Your team has strong distributed systems engineering capabilities

Hybrid Topologies

In practice, most production systems use a hybrid. A hierarchical backbone provides structure and compliance, while mesh connections between specific agents enable efficient cross-domain collaboration.

class HybridTopology:
    def __init__(self):
        self.hierarchy = {}  # Parent-child relationships
        self.mesh_links = {}  # Direct peer connections

    def add_hierarchical(self, parent: str, child: str):
        if parent not in self.hierarchy:
            self.hierarchy[parent] = []
        self.hierarchy[parent].append(child)

    def add_mesh_link(self, agent_a: str, agent_b: str):
        for agent in (agent_a, agent_b):
            if agent not in self.mesh_links:
                self.mesh_links[agent] = set()
        self.mesh_links[agent_a].add(agent_b)
        self.mesh_links[agent_b].add(agent_a)

    def route(self, source: str, target_capability: str) -> str:
        # First check mesh links for direct path
        if source in self.mesh_links:
            for peer in self.mesh_links[source]:
                if self._has_capability(peer, target_capability):
                    return f"mesh:{source}->{peer}"
        # Fall back to hierarchical routing
        return f"hierarchy:{source}->parent->...->target"

This gives you the compliance and observability of a hierarchy with the efficiency of mesh connections where it matters.

FAQ

Can you migrate from one topology to another?

Yes, but plan for it from the start. Use an abstraction layer (a routing interface) between agents and the topology. Agents call router.send(capability, message) rather than addressing specific agents. This allows you to swap the underlying topology without modifying agent code. Migration from flat to hierarchical is the most common and usually the easiest because you are adding structure, not removing it.

What is the latency impact of hierarchical routing?

Each hop in a hierarchical topology adds the coordinator agent's processing time, typically 10-50ms for a classification decision (without LLM calls) or 500ms-2s if the coordinator uses an LLM to make routing decisions. For latency-sensitive paths, add mesh links to bypass the hierarchy. Keep coordinator logic deterministic (rule-based) rather than LLM-powered whenever possible.

How do you test different topologies?

Build a topology simulator that models agent communication patterns with synthetic traffic. Measure latency, throughput, error propagation, and resource utilization for each topology. Use your actual agent capabilities and traffic patterns but simulate the communication layer. This lets you evaluate topologies without rewriting agent code.

Do all agents in a hierarchy need to use the same framework?

No. Agents at different levels can use different frameworks, models, and even languages, as long as they communicate through a standardized interface (message schemas, MCP, or HTTP APIs). This is actually a strength of hierarchical systems — each team can choose the best tool for their agent's specific domain.