Building Multi-Agent Systems with Claude: Coordination Without a Framework
Build multi-agent systems using the raw Anthropic API without any framework. Learn patterns for routing, delegation, result aggregation, and inter-agent communication using plain Python and Claude.
Why Build Without a Framework
Frameworks like LangChain and CrewAI provide convenient abstractions for multi-agent systems, but they also introduce complexity, version lock-in, and opaque behavior that can be hard to debug. For many production systems, building multi-agent coordination directly on the Anthropic API gives you full control over the communication protocol, error handling, and cost management.
The patterns in this guide use nothing beyond the anthropic Python SDK and standard library modules. You will learn the fundamental coordination patterns that every multi-agent framework implements under the hood.
Pattern 1: Router Agent
A router agent examines the user input and delegates to specialized agents:
import anthropic
import json
client = anthropic.Anthropic()
def router_agent(user_input: str) -> dict:
response = client.messages.create(
model="claude-haiku-3-5-20241022",
max_tokens=256,
system="""Classify the user request into exactly one category.
Return JSON: {"category": "...", "reasoning": "..."}
Categories: technical_support, billing, sales, general""",
messages=[{"role": "user", "content": user_input}]
)
return json.loads(response.content[0].text)
def technical_agent(user_input: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system="You are a technical support specialist. Diagnose issues and provide step-by-step solutions.",
messages=[{"role": "user", "content": user_input}]
)
return response.content[0].text
def billing_agent(user_input: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system="You are a billing specialist. Help with invoices, payments, and subscription changes.",
messages=[{"role": "user", "content": user_input}]
)
return response.content[0].text
AGENTS = {
"technical_support": technical_agent,
"billing": billing_agent,
}
def handle_request(user_input: str) -> str:
route = router_agent(user_input)
agent_fn = AGENTS.get(route["category"])
if agent_fn:
return agent_fn(user_input)
# Default fallback
return client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{"role": "user", "content": user_input}]
).content[0].text
result = handle_request("My API key stopped working after I rotated it")
print(result)
The router uses a cheap, fast model (Haiku) for classification, then dispatches to a more capable model (Sonnet) with a specialized system prompt. This is cost-efficient because most of the routing decisions are simple classification tasks.
Pattern 2: Parallel Delegation
When a task has independent subtasks, run multiple agents concurrently:
import anthropic
import asyncio
async def run_agent(system: str, prompt: str) -> str:
client = anthropic.AsyncAnthropic()
response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system=system,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
async def parallel_analysis(document: str) -> dict:
tasks = {
"summary": run_agent(
"You are a summarization agent. Produce a 3-paragraph executive summary.",
document
),
"risks": run_agent(
"You are a risk analysis agent. Identify all potential risks and rate them high/medium/low.",
document
),
"action_items": run_agent(
"You are a project management agent. Extract all action items with owners and deadlines.",
document
),
}
results = {}
for name, task in tasks.items():
results[name] = await task
return results
document = "Meeting notes: We discussed the Q2 product roadmap..."
analysis = asyncio.run(parallel_analysis(document))
for section, content in analysis.items():
print(f"\n=== {section.upper()} ===\n{content}")
Three agents analyze the same document simultaneously, each with a different focus. This completes in the time of the slowest agent rather than the sum of all three, dramatically reducing total latency.
Pattern 3: Sequential Pipeline
Some tasks require agents to work in sequence, where each agent's output feeds the next:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import anthropic
client = anthropic.Anthropic()
def pipeline_agent(system: str, input_text: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=system,
messages=[{"role": "user", "content": input_text}]
)
return response.content[0].text
def code_review_pipeline(code: str) -> dict:
# Stage 1: Bug detection
bugs = pipeline_agent(
"You are a bug detection agent. Identify bugs, off-by-one errors, and logic flaws. "
"List each bug with its line reference and severity.",
f"Review this code for bugs:\n\n{code}"
)
# Stage 2: Security review (informed by bugs found)
security = pipeline_agent(
"You are a security review agent. Identify security vulnerabilities including "
"injection, authentication, and data exposure issues.",
f"Code:\n{code}\n\nBugs already found:\n{bugs}\n\nNow identify security issues not covered above."
)
# Stage 3: Synthesize into a final report
report = pipeline_agent(
"You are a technical writing agent. Synthesize code review findings into a "
"clear, actionable report organized by priority.",
f"Bugs found:\n{bugs}\n\nSecurity issues:\n{security}\n\n"
"Create a unified code review report."
)
return {"bugs": bugs, "security": security, "report": report}
result = code_review_pipeline("def login(user, pw): ...")
print(result["report"])
Each stage adds context for the next. The security agent knows about bugs already found, so it focuses on security-specific issues. The synthesizer combines both perspectives into a coherent report.
Pattern 4: Aggregator with Quality Check
After parallel agents produce results, an aggregator merges and validates them:
import anthropic
import asyncio
async def research_with_aggregation(topic: str) -> str:
client = anthropic.AsyncAnthropic()
# Parallel research from different perspectives
perspectives = await asyncio.gather(
run_agent("Research from a technical perspective. Cite specific technologies.", topic),
run_agent("Research from a business/market perspective. Include market data.", topic),
run_agent("Research from a user experience perspective. Focus on usability.", topic),
)
# Aggregator merges and deduplicates
combined = "\n\n---\n\n".join(
f"Perspective {i+1}:\n{p}" for i, p in enumerate(perspectives)
)
response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a research synthesis agent. Merge multiple research perspectives "
"into a single coherent report. Remove duplicates, resolve contradictions, "
"and organize by theme. Flag any contradictions between sources.",
messages=[{"role": "user", "content": combined}]
)
return response.content[0].text
report = asyncio.run(research_with_aggregation("State of AI agents in enterprise software 2026"))
print(report)
The aggregator is a critical quality control step. It catches contradictions between agents, removes redundancy, and produces a unified output that is higher quality than any single agent could produce alone.
Error Handling in Multi-Agent Systems
Production multi-agent systems need robust error handling:
import anthropic
import asyncio
async def safe_agent_call(name: str, system: str, prompt: str) -> dict:
try:
client = anthropic.AsyncAnthropic()
response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system=system,
messages=[{"role": "user", "content": prompt}]
)
return {"agent": name, "status": "success", "result": response.content[0].text}
except anthropic.RateLimitError:
return {"agent": name, "status": "rate_limited", "result": None}
except anthropic.APIError as e:
return {"agent": name, "status": "error", "result": str(e)}
async def resilient_pipeline(prompt: str):
results = await asyncio.gather(
safe_agent_call("analyst", "You are a data analyst.", prompt),
safe_agent_call("writer", "You are a technical writer.", prompt),
safe_agent_call("reviewer", "You are a code reviewer.", prompt),
)
successful = [r for r in results if r["status"] == "success"]
failed = [r for r in results if r["status"] != "success"]
if failed:
print(f"Warning: {len(failed)} agents failed: {[f['agent'] for f in failed]}")
return successful
Wrapping each agent call in error handling ensures that one failure does not take down the entire system. Log failures for debugging but continue with whatever results are available.
FAQ
When should I use a framework instead of raw API calls?
Use a framework when you need features like persistent memory across sessions, complex state machines, built-in tracing dashboards, or agent-to-agent handoff protocols. Use raw API calls when you need full control, minimal dependencies, or when your coordination pattern does not fit a framework's assumptions.
How do I manage costs in multi-agent systems?
Use cheap models (Haiku) for routing, classification, and simple tasks. Reserve expensive models (Opus, Sonnet) for tasks requiring deep reasoning. Use prompt caching aggressively to reduce repeated context costs. Set max_tokens appropriately for each agent — a summarizer needs fewer tokens than a code generator.
How do agents share state without a framework?
Pass state explicitly through function arguments. For simple systems, a shared dictionary works. For complex systems, use a message queue (Redis, NATS) or a shared database. The key principle is making state flow explicit rather than relying on implicit shared memory.
#Anthropic #Claude #MultiAgent #Architecture #Coordination #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.