Building Multi-Agent Systems with Claude: Coordination Without a Framework

Why Build Without a Framework

Frameworks like LangChain and CrewAI provide convenient abstractions for multi-agent systems, but they also introduce complexity, version lock-in, and opaque behavior that can be hard to debug. For many production systems, building multi-agent coordination directly on the Anthropic API gives you full control over the communication protocol, error handling, and cost management.

The patterns in this guide use nothing beyond the anthropic Python SDK and standard library modules. You will learn the fundamental coordination patterns that every multi-agent framework implements under the hood.

Pattern 1: Router Agent

A router agent examines the user input and delegates to specialized agents:

import anthropic
import json

client = anthropic.Anthropic()

def router_agent(user_input: str) -> dict:
    response = client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=256,
        system="""Classify the user request into exactly one category.
Return JSON: {"category": "...", "reasoning": "..."}
Categories: technical_support, billing, sales, general""",
        messages=[{"role": "user", "content": user_input}]
    )
    return json.loads(response.content[0].text)

def technical_agent(user_input: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system="You are a technical support specialist. Diagnose issues and provide step-by-step solutions.",
        messages=[{"role": "user", "content": user_input}]
    )
    return response.content[0].text

def billing_agent(user_input: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system="You are a billing specialist. Help with invoices, payments, and subscription changes.",
        messages=[{"role": "user", "content": user_input}]
    )
    return response.content[0].text

AGENTS = {
    "technical_support": technical_agent,
    "billing": billing_agent,
}

def handle_request(user_input: str) -> str:
    route = router_agent(user_input)
    agent_fn = AGENTS.get(route["category"])
    if agent_fn:
        return agent_fn(user_input)
    # Default fallback
    return client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        messages=[{"role": "user", "content": user_input}]
    ).content[0].text

result = handle_request("My API key stopped working after I rotated it")
print(result)

The router uses a cheap, fast model (Haiku) for classification, then dispatches to a more capable model (Sonnet) with a specialized system prompt. This is cost-efficient because most of the routing decisions are simple classification tasks.

Pattern 2: Parallel Delegation

When a task has independent subtasks, run multiple agents concurrently:

import anthropic
import asyncio

async def run_agent(system: str, prompt: str) -> str:
    client = anthropic.AsyncAnthropic()
    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

async def parallel_analysis(document: str) -> dict:
    tasks = {
        "summary": run_agent(
            "You are a summarization agent. Produce a 3-paragraph executive summary.",
            document
        ),
        "risks": run_agent(
            "You are a risk analysis agent. Identify all potential risks and rate them high/medium/low.",
            document
        ),
        "action_items": run_agent(
            "You are a project management agent. Extract all action items with owners and deadlines.",
            document
        ),
    }

    results = {}
    for name, task in tasks.items():
        results[name] = await task

    return results

document = "Meeting notes: We discussed the Q2 product roadmap..."
analysis = asyncio.run(parallel_analysis(document))
for section, content in analysis.items():
    print(f"\n=== {section.upper()} ===\n{content}")

Three agents analyze the same document simultaneously, each with a different focus. This completes in the time of the slowest agent rather than the sum of all three, dramatically reducing total latency.

Pattern 3: Sequential Pipeline

Some tasks require agents to work in sequence, where each agent's output feeds the next:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

import anthropic

client = anthropic.Anthropic()

def pipeline_agent(system: str, input_text: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system=system,
        messages=[{"role": "user", "content": input_text}]
    )
    return response.content[0].text

def code_review_pipeline(code: str) -> dict:
    # Stage 1: Bug detection
    bugs = pipeline_agent(
        "You are a bug detection agent. Identify bugs, off-by-one errors, and logic flaws. "
        "List each bug with its line reference and severity.",
        f"Review this code for bugs:\n\n{code}"
    )

    # Stage 2: Security review (informed by bugs found)
    security = pipeline_agent(
        "You are a security review agent. Identify security vulnerabilities including "
        "injection, authentication, and data exposure issues.",
        f"Code:\n{code}\n\nBugs already found:\n{bugs}\n\nNow identify security issues not covered above."
    )

    # Stage 3: Synthesize into a final report
    report = pipeline_agent(
        "You are a technical writing agent. Synthesize code review findings into a "
        "clear, actionable report organized by priority.",
        f"Bugs found:\n{bugs}\n\nSecurity issues:\n{security}\n\n"
        "Create a unified code review report."
    )

    return {"bugs": bugs, "security": security, "report": report}

result = code_review_pipeline("def login(user, pw): ...")
print(result["report"])

Each stage adds context for the next. The security agent knows about bugs already found, so it focuses on security-specific issues. The synthesizer combines both perspectives into a coherent report.

Pattern 4: Aggregator with Quality Check

After parallel agents produce results, an aggregator merges and validates them:

import anthropic
import asyncio

async def research_with_aggregation(topic: str) -> str:
    client = anthropic.AsyncAnthropic()

    # Parallel research from different perspectives
    perspectives = await asyncio.gather(
        run_agent("Research from a technical perspective. Cite specific technologies.", topic),
        run_agent("Research from a business/market perspective. Include market data.", topic),
        run_agent("Research from a user experience perspective. Focus on usability.", topic),
    )

    # Aggregator merges and deduplicates
    combined = "\n\n---\n\n".join(
        f"Perspective {i+1}:\n{p}" for i, p in enumerate(perspectives)
    )

    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system="You are a research synthesis agent. Merge multiple research perspectives "
               "into a single coherent report. Remove duplicates, resolve contradictions, "
               "and organize by theme. Flag any contradictions between sources.",
        messages=[{"role": "user", "content": combined}]
    )
    return response.content[0].text

report = asyncio.run(research_with_aggregation("State of AI agents in enterprise software 2026"))
print(report)

The aggregator is a critical quality control step. It catches contradictions between agents, removes redundancy, and produces a unified output that is higher quality than any single agent could produce alone.

Error Handling in Multi-Agent Systems

Production multi-agent systems need robust error handling:

import anthropic
import asyncio

async def safe_agent_call(name: str, system: str, prompt: str) -> dict:
    try:
        client = anthropic.AsyncAnthropic()
        response = await client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            system=system,
            messages=[{"role": "user", "content": prompt}]
        )
        return {"agent": name, "status": "success", "result": response.content[0].text}
    except anthropic.RateLimitError:
        return {"agent": name, "status": "rate_limited", "result": None}
    except anthropic.APIError as e:
        return {"agent": name, "status": "error", "result": str(e)}

async def resilient_pipeline(prompt: str):
    results = await asyncio.gather(
        safe_agent_call("analyst", "You are a data analyst.", prompt),
        safe_agent_call("writer", "You are a technical writer.", prompt),
        safe_agent_call("reviewer", "You are a code reviewer.", prompt),
    )

    successful = [r for r in results if r["status"] == "success"]
    failed = [r for r in results if r["status"] != "success"]

    if failed:
        print(f"Warning: {len(failed)} agents failed: {[f['agent'] for f in failed]}")

    return successful

Wrapping each agent call in error handling ensures that one failure does not take down the entire system. Log failures for debugging but continue with whatever results are available.

FAQ

When should I use a framework instead of raw API calls?

Use a framework when you need features like persistent memory across sessions, complex state machines, built-in tracing dashboards, or agent-to-agent handoff protocols. Use raw API calls when you need full control, minimal dependencies, or when your coordination pattern does not fit a framework's assumptions.

How do I manage costs in multi-agent systems?

Use cheap models (Haiku) for routing, classification, and simple tasks. Reserve expensive models (Opus, Sonnet) for tasks requiring deep reasoning. Use prompt caching aggressively to reduce repeated context costs. Set max_tokens appropriately for each agent — a summarizer needs fewer tokens than a code generator.

Pass state explicitly through function arguments. For simple systems, a shared dictionary works. For complex systems, use a message queue (Redis, NATS) or a shared database. The key principle is making state flow explicit rather than relying on implicit shared memory.

#Anthropic #Claude #MultiAgent #Architecture #Coordination #AgenticAI #LearnAI #AIEngineering

Building Multi-Agent Systems with Claude: Coordination Without a Framework

Why Build Without a Framework

Pattern 1: Router Agent

Pattern 2: Parallel Delegation

Pattern 3: Sequential Pipeline

Pattern 4: Aggregator with Quality Check

Error Handling in Multi-Agent Systems

FAQ

When should I use a framework instead of raw API calls?

How do I manage costs in multi-agent systems?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding

Why Build Without a Framework

Pattern 1: Router Agent

Pattern 2: Parallel Delegation

Pattern 3: Sequential Pipeline

Pattern 4: Aggregator with Quality Check

Error Handling in Multi-Agent Systems

FAQ

When should I use a framework instead of raw API calls?

How do I manage costs in multi-agent systems?

How do agents share state without a framework?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding