Flat vs Hierarchical vs Mesh: Choosing the Right Multi-Agent Topology
Architectural comparison of multi-agent topologies including flat, hierarchical, and mesh designs with performance trade-offs, decision frameworks, and migration strategies.
Topology Is the First Architectural Decision
Before you write a single line of agent code, you must decide how your agents relate to each other structurally. This is the topology question, and it constrains everything that follows: how agents discover each other, how work is distributed, how failures propagate, and how the system scales.
The three fundamental topologies are flat (all agents are peers), hierarchical (agents form a tree), and mesh (agents form a dynamic peer-to-peer network). Each has clear strengths and weaknesses. Choosing the wrong topology for your problem is the kind of architectural mistake that gets more expensive to fix every week it persists.
Flat Topology: All Agents Are Peers
In a flat topology, every agent can communicate directly with every other agent. There is no coordinator, no hierarchy, and no routing layer. Each agent decides independently which other agents to collaborate with.
from dataclasses import dataclass, field
import asyncio
@dataclass
class FlatAgent:
name: str
capabilities: list[str]
peers: dict[str, "FlatAgent"] = field(default_factory=dict)
def discover_peers(self, all_agents: list["FlatAgent"]):
for agent in all_agents:
if agent.name != self.name:
self.peers[agent.name] = agent
async def request_help(self, capability: str,
task: dict) -> dict | None:
for peer in self.peers.values():
if capability in peer.capabilities:
return await peer.handle_task(task)
return None
async def handle_task(self, task: dict) -> dict:
return {
"handled_by": self.name,
"task": task["description"],
"status": "complete",
}
# Setup
research_agent = FlatAgent("researcher", ["web_search", "summarize"])
writer_agent = FlatAgent("writer", ["draft_email", "edit_text"])
data_agent = FlatAgent("data", ["query_db", "generate_report"])
all_agents = [research_agent, writer_agent, data_agent]
for agent in all_agents:
agent.discover_peers(all_agents)
When Flat Works
Flat topologies excel in small, collaborative teams of 2-5 agents where every agent may need to interact with every other agent. Think of a content creation pipeline: a research agent, a writing agent, and an editing agent. Each may ask the others for input at any point.
When Flat Breaks
The number of potential communication paths grows quadratically: N*(N-1)/2. At 5 agents, that is 10 paths. At 20 agents, it is 190. At 100 agents, it is 4,950. Testing, monitoring, and debugging become impractical.
Flat topologies also lack coordination. If two agents both try to handle the same task, you get duplicated work. If no agent claims a task, it falls through the cracks. There is no natural place to enforce global policies or observe system-wide behavior.
Complexity: O(N^2) communication paths Best for: 2-5 agents, prototyping, collaborative workflows Avoid for: Production systems above 10 agents
Hierarchical Topology: Agents Form a Tree
Hierarchical topologies organize agents into layers. A top-level coordinator (the root) manages mid-level coordinators or specialists, which may in turn manage their own sub-agents. Communication flows up and down the tree.
from dataclasses import dataclass, field
from typing import Any
@dataclass
class HierarchicalAgent:
name: str
role: str # "coordinator", "specialist", "worker"
children: list["HierarchicalAgent"] = field(default_factory=list)
parent: "HierarchicalAgent | None" = None
def add_child(self, child: "HierarchicalAgent"):
child.parent = self
self.children.append(child)
async def delegate(self, task: dict) -> dict:
"""Coordinator delegates to the best child."""
best_child = self._select_child(task)
if best_child:
return await best_child.execute(task)
# No suitable child — escalate to parent
if self.parent:
return await self.parent.escalate(task)
return {"error": "No agent can handle this task"}
async def execute(self, task: dict) -> dict:
if self.role == "worker":
return await self._do_work(task)
return await self.delegate(task)
async def escalate(self, task: dict) -> dict:
"""Handle escalated tasks from children."""
# Try other children first
for child in self.children:
if self._can_handle(child, task):
return await child.execute(task)
# Escalate further up
if self.parent:
return await self.parent.escalate(task)
return {"status": "requires_human", "task": task}
def _select_child(self, task: dict):
for child in self.children:
if self._can_handle(child, task):
return child
return None
def _can_handle(self, child, task: dict) -> bool:
return task.get("domain") == child.name
async def _do_work(self, task: dict) -> dict:
return {"handled_by": self.name, "status": "complete"}
# Build the tree
root = HierarchicalAgent("coordinator", "coordinator")
support = HierarchicalAgent("support", "coordinator")
sales = HierarchicalAgent("sales", "coordinator")
root.add_child(support)
root.add_child(sales)
billing_worker = HierarchicalAgent("billing", "worker")
tech_worker = HierarchicalAgent("technical", "worker")
support.add_child(billing_worker)
support.add_child(tech_worker)
pricing_worker = HierarchicalAgent("pricing", "worker")
demo_worker = HierarchicalAgent("demo", "worker")
sales.add_child(pricing_worker)
sales.add_child(demo_worker)
When Hierarchical Works
Hierarchical topologies excel at scale. They reduce communication complexity from O(N^2) to O(N) because agents only communicate with their parent and children. They provide natural escalation paths, clear authority boundaries, and straightforward observability — you can monitor each level of the tree independently.
Most enterprise multi-agent systems use hierarchical topologies because they map naturally to organizational structures and compliance requirements.
When Hierarchical Breaks
Hierarchical topologies struggle with cross-cutting concerns. If the billing worker needs data from the demo worker, the request must travel up through the support coordinator, across to the sales coordinator, and down to the demo worker. This adds latency and places unnecessary load on coordinators.
Rigid hierarchies also resist change. Adding a new capability often requires restructuring the tree.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Complexity: O(N) communication paths, O(log N) routing depth Best for: 10-500 agents, enterprise systems, compliance-heavy domains Avoid for: Highly dynamic workloads, frequent cross-domain collaboration
Mesh Topology: Dynamic Peer-to-Peer
Mesh topologies allow any agent to communicate with any other agent, like flat topologies, but add a discovery and routing layer that prevents the quadratic explosion. Agents register their capabilities with a service registry, and communication is routed dynamically based on capability matching.
from dataclasses import dataclass, field
import asyncio
@dataclass
class MeshNode:
agent_id: str
capabilities: set[str]
connections: set[str] = field(default_factory=set)
max_connections: int = 8 # Limit to prevent N^2
class MeshRegistry:
def __init__(self):
self.nodes: dict[str, MeshNode] = {}
def register(self, agent_id: str, capabilities: set[str]):
node = MeshNode(agent_id=agent_id, capabilities=capabilities)
self.nodes[agent_id] = node
self._optimize_connections(node)
def _optimize_connections(self, new_node: MeshNode):
"""Connect to agents with complementary capabilities."""
scored = []
for existing in self.nodes.values():
if existing.agent_id == new_node.agent_id:
continue
# Score based on capability overlap and complement
overlap = len(
new_node.capabilities & existing.capabilities
)
complement = len(
existing.capabilities - new_node.capabilities
)
score = complement - overlap # Prefer complementary
scored.append((existing, score))
scored.sort(key=lambda x: x[1], reverse=True)
for node, _ in scored[:new_node.max_connections]:
new_node.connections.add(node.agent_id)
node.connections.add(new_node.agent_id)
def find_path(self, source: str,
required_capability: str) -> list[str] | None:
"""BFS to find an agent with the required capability."""
visited = set()
queue = [(source, [source])]
while queue:
current, path = queue.pop(0)
if current in visited:
continue
visited.add(current)
node = self.nodes.get(current)
if not node:
continue
if (required_capability in node.capabilities
and current != source):
return path + [current] if current not in path else path
for neighbor in node.connections:
if neighbor not in visited:
queue.append((neighbor, path + [neighbor]))
return None
When Mesh Works
Mesh topologies shine in dynamic environments where agent capabilities change frequently, new agents are added and removed regularly, and cross-domain collaboration is common. They combine the flexibility of flat topologies with the scalability of structured routing.
Research labs, creative collaboration platforms, and adaptive systems benefit from mesh topologies because the workflow is not predetermined — agents self-organize based on the problem.
When Mesh Breaks
Mesh topologies are the most complex to implement and operate. The routing algorithm, connection management, and consistency model all require careful engineering. Debugging is harder because communication paths are dynamic. Without careful connection limits, the mesh can degenerate into a flat topology.
Complexity: O(N * max_connections) paths, O(diameter) routing depth Best for: Dynamic workloads, research environments, adaptive systems Avoid for: Compliance-heavy domains, systems requiring strict audit trails
Decision Framework
Use this framework to select your starting topology:
Choose Flat when:
- You have fewer than 6 agents
- You are prototyping or in early development
- Every agent genuinely needs direct access to every other agent
- You can migrate to hierarchical later
Choose Hierarchical when:
- You have 10+ agents or expect to grow beyond 10
- Your domain has natural authority boundaries (departments, approval chains)
- Compliance requires clear escalation paths and audit trails
- You value operational simplicity over communication flexibility
Choose Mesh when:
- Agent capabilities are dynamic and change at runtime
- Workflows are emergent and not predetermined
- Cross-domain collaboration is the norm, not the exception
- Your team has strong distributed systems engineering capabilities
Hybrid Topologies
In practice, most production systems use a hybrid. A hierarchical backbone provides structure and compliance, while mesh connections between specific agents enable efficient cross-domain collaboration.
class HybridTopology:
def __init__(self):
self.hierarchy = {} # Parent-child relationships
self.mesh_links = {} # Direct peer connections
def add_hierarchical(self, parent: str, child: str):
if parent not in self.hierarchy:
self.hierarchy[parent] = []
self.hierarchy[parent].append(child)
def add_mesh_link(self, agent_a: str, agent_b: str):
for agent in (agent_a, agent_b):
if agent not in self.mesh_links:
self.mesh_links[agent] = set()
self.mesh_links[agent_a].add(agent_b)
self.mesh_links[agent_b].add(agent_a)
def route(self, source: str, target_capability: str) -> str:
# First check mesh links for direct path
if source in self.mesh_links:
for peer in self.mesh_links[source]:
if self._has_capability(peer, target_capability):
return f"mesh:{source}->{peer}"
# Fall back to hierarchical routing
return f"hierarchy:{source}->parent->...->target"
This gives you the compliance and observability of a hierarchy with the efficiency of mesh connections where it matters.
FAQ
Can you migrate from one topology to another?
Yes, but plan for it from the start. Use an abstraction layer (a routing interface) between agents and the topology. Agents call router.send(capability, message) rather than addressing specific agents. This allows you to swap the underlying topology without modifying agent code. Migration from flat to hierarchical is the most common and usually the easiest because you are adding structure, not removing it.
What is the latency impact of hierarchical routing?
Each hop in a hierarchical topology adds the coordinator agent's processing time, typically 10-50ms for a classification decision (without LLM calls) or 500ms-2s if the coordinator uses an LLM to make routing decisions. For latency-sensitive paths, add mesh links to bypass the hierarchy. Keep coordinator logic deterministic (rule-based) rather than LLM-powered whenever possible.
How do you test different topologies?
Build a topology simulator that models agent communication patterns with synthetic traffic. Measure latency, throughput, error propagation, and resource utilization for each topology. Use your actual agent capabilities and traffic patterns but simulate the communication layer. This lets you evaluate topologies without rewriting agent code.
Do all agents in a hierarchy need to use the same framework?
No. Agents at different levels can use different frameworks, models, and even languages, as long as they communicate through a standardized interface (message schemas, MCP, or HTTP APIs). This is actually a strength of hierarchical systems — each team can choose the best tool for their agent's specific domain.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.