Multi-Agent Systems with Claude: Building Teams of AI Agents
Learn how to design and implement multi-agent systems using the Claude API and Agent SDK. Covers architecture patterns, inter-agent communication, task delegation, and real-world production examples.
Why Single Agents Are Not Enough
The first generation of LLM-powered applications used a single model call to handle each request. The second generation introduced tool-calling agents that could loop through multi-step tasks. But as tasks grow in complexity -- analyzing a full codebase, orchestrating a customer support pipeline, or running a due diligence review across hundreds of documents -- a single agent becomes a bottleneck.
Multi-agent systems solve this by decomposing complex work across specialized agents that communicate, delegate, and collaborate. Instead of one agent trying to be an expert at everything, you build a team where each agent has a focused role, a constrained toolset, and a clear responsibility boundary.
Anthropic's own Claude Code uses this pattern internally. When you ask Claude Code to refactor a large codebase, it spawns subagent processes for file analysis, test execution, and code generation. The orchestrator coordinates their work and synthesizes a coherent result.
Core Architecture Patterns
Pattern 1: Hub-and-Spoke (Orchestrator Model)
The most common multi-agent pattern uses a central orchestrator agent that delegates tasks to specialized subagents.
import asyncio
from anthropic import Anthropic
client = Anthropic()
async def orchestrator(task: str) -> str:
"""Central agent that decomposes tasks and delegates to specialists."""
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=4096,
system="""You are an orchestrator agent. Break down the user's request
into subtasks and specify which specialist should handle each one.
Available specialists: researcher, coder, reviewer, writer.
Output a JSON array of {specialist, task, priority} objects.""",
messages=[{"role": "user", "content": task}]
)
subtasks = parse_subtasks(response.content[0].text)
results = await asyncio.gather(*[
dispatch_to_specialist(st["specialist"], st["task"])
for st in subtasks
])
# Synthesize results
synthesis = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=4096,
system="Synthesize these specialist results into a coherent final answer.",
messages=[{"role": "user", "content": format_results(results)}]
)
return synthesis.content[0].text
async def dispatch_to_specialist(specialist: str, task: str) -> str:
"""Route a subtask to the appropriate specialist agent."""
system_prompts = {
"researcher": "You are a research specialist. Find and cite accurate information.",
"coder": "You are a coding specialist. Write clean, tested, production code.",
"reviewer": "You are a code review specialist. Find bugs and suggest improvements.",
"writer": "You are a technical writer. Produce clear, well-structured documentation.",
}
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=4096,
system=system_prompts[specialist],
messages=[{"role": "user", "content": task}]
)
return response.content[0].text
Pattern 2: Pipeline (Sequential Processing)
In pipeline architectures, each agent's output becomes the next agent's input. This works well for workflows with clear stage dependencies.
async def pipeline(raw_data: str) -> str:
"""Sequential multi-agent pipeline for document processing."""
# Stage 1: Extract
extracted = await run_agent(
system="Extract all key facts, dates, and entities from this document.",
user_input=raw_data,
model="claude-haiku-4-5-20250514" # Fast, cheap for extraction
)
# Stage 2: Analyze
analysis = await run_agent(
system="Analyze these extracted facts for inconsistencies and patterns.",
user_input=extracted,
model="claude-sonnet-4-5-20250514" # Balanced for analysis
)
# Stage 3: Synthesize
report = await run_agent(
system="Write an executive summary based on this analysis.",
user_input=analysis,
model="claude-sonnet-4-5-20250514"
)
return report
Pattern 3: Debate (Adversarial Collaboration)
Two agents argue opposing positions, and a judge agent synthesizes the best answer. This improves accuracy on ambiguous or high-stakes decisions.
async def debate(question: str) -> str:
"""Two agents debate, a third judges."""
advocate = await run_agent(
system="Argue strongly IN FAVOR of the proposition. Provide evidence.",
user_input=question
)
critic = await run_agent(
system="Argue strongly AGAINST the proposition. Provide evidence.",
user_input=question
)
judgment = await run_agent(
system="""You are an impartial judge. Review both arguments,
identify the strongest points from each side, and deliver
a balanced, well-reasoned verdict.""",
user_input=f"FOR:\n{advocate}\n\nAGAINST:\n{critic}"
)
return judgment
Inter-Agent Communication
The biggest challenge in multi-agent systems is not building individual agents -- it is building reliable communication between them. There are three primary strategies.
Shared Memory (Context Store)
Agents read from and write to a shared key-value store. This works well for loosely coupled agents that need access to a growing body of knowledge.
from dataclasses import dataclass, field
from typing import Any
@dataclass
class SharedMemory:
store: dict[str, Any] = field(default_factory=dict)
history: list[dict] = field(default_factory=list)
def write(self, agent_id: str, key: str, value: Any):
self.store[key] = value
self.history.append({"agent": agent_id, "action": "write", "key": key})
def read(self, key: str) -> Any:
return self.store.get(key)
def get_context_for_agent(self, agent_id: str, relevant_keys: list[str]) -> str:
context_parts = []
for key in relevant_keys:
if key in self.store:
context_parts.append(f"{key}: {self.store[key]}")
return "\n".join(context_parts)
Message Passing
Agents communicate through structured messages. This provides better isolation and audit trails.
Tool-Mediated Handoff
One agent writes output to a file or database, and the next agent reads from it. This is the simplest approach and works well with the Claude Agent SDK's built-in file tools.
Cost Optimization for Multi-Agent Systems
Multi-agent systems multiply API costs because each agent makes its own calls. Here are proven strategies to keep costs manageable.
| Strategy | Cost Reduction | Implementation Complexity |
|---|---|---|
| Use Haiku for simple tasks | 60-80% | Low |
| Prompt caching on system prompts | Up to 90% on cached tokens | Low |
| Batch API for non-real-time work | 50% | Medium |
| Shared context compression | 30-50% | Medium |
| Agent result caching | Variable | Medium |
Model Tiering
Not every agent needs Opus. A practical tiering strategy:
- Orchestrator: Sonnet (needs good reasoning to decompose tasks)
- Researcher: Sonnet (needs comprehension and synthesis)
- Extractor: Haiku (structured extraction is simpler)
- Formatter: Haiku (template-based output)
- Judge/Reviewer: Sonnet or Opus (needs nuanced judgment)
Production Considerations
Error Isolation
When one agent fails, the system should not cascade. Wrap each agent call in error handling that captures the failure and allows the orchestrator to retry or reassign.
Observability
Log every inter-agent message with timestamps, token counts, and costs. Without observability, debugging a multi-agent system is nearly impossible. Use structured logging with correlation IDs that tie all agent calls in a single workflow together.
Rate Limiting
With multiple agents making concurrent API calls, you can quickly hit Claude API rate limits. Implement a shared rate limiter:
import asyncio
class RateLimiter:
def __init__(self, max_requests_per_minute: int = 50):
self.semaphore = asyncio.Semaphore(max_requests_per_minute)
self.reset_interval = 60
async def acquire(self):
await self.semaphore.acquire()
asyncio.get_event_loop().call_later(
self.reset_interval, self.semaphore.release
)
rate_limiter = RateLimiter(max_requests_per_minute=50)
async def rate_limited_api_call(**kwargs):
await rate_limiter.acquire()
return client.messages.create(**kwargs)
Testing Multi-Agent Systems
Test each agent in isolation first, then test the integration. Mock the API responses to create deterministic test scenarios. Track the full conversation flow to verify agents are communicating correctly.
When to Use Multi-Agent Systems
Multi-agent systems add complexity. Use them when:
- A single agent's context window is insufficient for the full task
- The task naturally decomposes into specialized subtasks
- You need different model tiers for different parts of the work
- Parallel processing would significantly reduce latency
- You need adversarial verification for high-stakes decisions
Avoid them when a single well-prompted agent with tools can handle the task in under 10 turns. The overhead of orchestration is not free, and simpler architectures are easier to debug and maintain.
NYC News
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.