Skip to content
Back to Blog
Agentic AI6 min read

Claude's Orchestrator and Subagent Model Explained

Deep dive into the orchestrator-subagent architecture pattern used in Claude Code and the Claude Agent SDK. Learn how task decomposition, delegation, and result synthesis work under the hood.

The Architecture Behind Claude Code's Power

When Claude Code tackles a complex task like "refactor this module to use dependency injection and update all tests," it does not attempt everything in a single reasoning chain. Instead, it uses an orchestrator-subagent model where a primary agent decomposes the work, delegates pieces to focused subagents, and synthesizes the results.

This pattern is now directly available through the Claude Agent SDK, and understanding it is essential for building production-grade agentic applications.

How the Orchestrator-Subagent Model Works

The model operates in four phases:

Phase 1: Task Decomposition

The orchestrator agent receives the user's request and breaks it into discrete, parallelizable subtasks. Each subtask has a clear objective, input specification, and expected output format.

from anthropic import Anthropic

client = Anthropic()

ORCHESTRATOR_SYSTEM = """You are a task orchestrator. Given a complex request:
1. Break it into independent subtasks (max 5)
2. For each subtask, specify:
   - objective: what to accomplish
   - context: what information the subagent needs
   - output_format: expected response structure
   - model_tier: haiku, sonnet, or opus based on complexity
3. Identify dependencies between subtasks
4. Return a JSON execution plan."""

def decompose_task(user_request: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-5-20250514",
        max_tokens=4096,
        system=ORCHESTRATOR_SYSTEM,
        messages=[{"role": "user", "content": user_request}]
    )
    return parse_execution_plan(response.content[0].text)

Phase 2: Subagent Dispatch

The orchestrator spawns subagents for each subtask. Subagents are lightweight -- they have a focused system prompt, a constrained toolset, and a single objective. This constraint is a feature, not a limitation: it prevents subagents from going off-task and keeps token usage predictable.

import asyncio

SUBAGENT_CONFIGS = {
    "analyzer": {
        "system": "You analyze code structure and report findings in structured JSON.",
        "tools": ["Read", "Glob", "Grep"],
        "model": "claude-sonnet-4-5-20250514",
        "max_tokens": 4096,
    },
    "implementer": {
        "system": "You implement code changes precisely as specified. Write clean, tested code.",
        "tools": ["Read", "Write", "Edit", "Bash"],
        "model": "claude-sonnet-4-5-20250514",
        "max_tokens": 8192,
    },
    "tester": {
        "system": "You write and run tests. Report pass/fail status with details.",
        "tools": ["Read", "Write", "Bash"],
        "model": "claude-sonnet-4-5-20250514",
        "max_tokens": 4096,
    },
    "reviewer": {
        "system": "You review code for bugs, security issues, and style violations.",
        "tools": ["Read", "Glob", "Grep"],
        "model": "claude-sonnet-4-5-20250514",
        "max_tokens": 4096,
    },
}

async def spawn_subagent(config_name: str, task: str) -> dict:
    config = SUBAGENT_CONFIGS[config_name]
    response = client.messages.create(
        model=config["model"],
        max_tokens=config["max_tokens"],
        system=config["system"],
        messages=[{"role": "user", "content": task}]
    )
    return {
        "agent": config_name,
        "result": response.content[0].text,
        "tokens_used": response.usage.input_tokens + response.usage.output_tokens,
    }

Phase 3: Dependency Resolution and Execution

Not all subtasks can run in parallel. The orchestrator respects dependency ordering:

async def execute_plan(plan: dict) -> list[dict]:
    results = {}

    for phase in plan["phases"]:
        # Run all tasks in this phase concurrently
        phase_tasks = []
        for subtask in phase["tasks"]:
            # Inject results from prior phases into context
            context = subtask["context"]
            for dep in subtask.get("dependencies", []):
                context += f"\n\nResult from {dep}:\n{results[dep]['result']}"

            phase_tasks.append(
                spawn_subagent(subtask["agent_type"], context)
            )

        phase_results = await asyncio.gather(*phase_tasks)
        for subtask, result in zip(phase["tasks"], phase_results):
            results[subtask["id"]] = result

    return list(results.values())

Phase 4: Result Synthesis

The orchestrator reviews all subagent results and produces a coherent final output. This is where the orchestrator adds the most value -- it resolves conflicts between subagent outputs, fills gaps, and presents a unified result.

SYNTHESIS_SYSTEM = """You are a synthesis agent. Given results from multiple
specialist agents, produce a single coherent response that:
1. Integrates all findings without duplication
2. Resolves any conflicts between agents (explain your reasoning)
3. Highlights areas of uncertainty or disagreement
4. Provides a clear, actionable summary"""

def synthesize(original_request: str, agent_results: list[dict]) -> str:
    formatted_results = "\n\n".join([
        f"=== {r['agent']} ===\n{r['result']}" for r in agent_results
    ])

    response = client.messages.create(
        model="claude-sonnet-4-5-20250514",
        max_tokens=8192,
        system=SYNTHESIS_SYSTEM,
        messages=[{
            "role": "user",
            "content": f"Original request: {original_request}\n\nAgent results:\n{formatted_results}"
        }]
    )
    return response.content[0].text

Real-World Example: Automated PR Review Pipeline

Here is how a production PR review system uses the orchestrator-subagent model:

  1. Orchestrator receives a pull request diff
  2. Analyzer subagent maps the changed files and identifies affected modules
  3. Security reviewer subagent scans for vulnerability patterns (SQL injection, XSS, auth bypasses)
  4. Logic reviewer subagent checks for bugs, edge cases, and race conditions
  5. Style reviewer subagent verifies coding standards and consistency
  6. Test coverage subagent checks if new code has adequate test coverage
  7. Orchestrator synthesizes all reviews into a single, prioritized feedback document

This pipeline processes a 500-line PR in under 30 seconds with five parallel subagents, compared to 2-3 minutes with a single sequential agent.

Orchestrator Design Principles

Principle 1: Minimal Context Per Subagent

Give each subagent only the information it needs. A security reviewer does not need the full project history -- it needs the diff and the security policy. Smaller context means faster responses, lower costs, and less chance of distraction.

Principle 2: Typed Contracts Between Agents

Define explicit input/output schemas for each agent. When the analyzer outputs a JSON structure, the implementer should expect exactly that structure. Type mismatches between agents are the most common source of multi-agent bugs.

Principle 3: Idempotent Subagents

Design subagents so that running them twice with the same input produces the same output. This makes retry logic safe and debugging reproducible.

Principle 4: Fail-Fast with Graceful Degradation

If a subagent fails, the orchestrator should decide whether to retry, skip, or substitute a default. Not every subtask is critical -- a failed style review should not block a security review.

Cost Analysis

For a typical orchestrator + 4 subagent workflow:

Component Model Input Tokens Output Tokens Cost
Orchestrator (decompose) Sonnet 2,000 800 $0.018
Subagent 1 (analyze) Haiku 3,000 1,000 $0.006
Subagent 2 (implement) Sonnet 5,000 3,000 $0.060
Subagent 3 (test) Sonnet 4,000 2,000 $0.042
Subagent 4 (review) Haiku 4,000 1,500 $0.012
Orchestrator (synthesize) Sonnet 8,000 2,000 $0.054
Total 26,000 10,300 $0.192

This is roughly the same cost as a single long agent session, but the work completes in one-third of the wall-clock time due to parallelism.

Anti-Patterns to Avoid

Over-decomposition: Breaking a simple task into five subtasks when one agent could handle it adds latency and cost without benefit.

Circular dependencies: If Agent A needs Agent B's output and Agent B needs Agent A's output, the system deadlocks. Design acyclic dependency graphs.

Orchestrator as bottleneck: If the orchestrator does too much work itself, you lose the benefits of delegation. The orchestrator should decompose, delegate, and synthesize -- not execute.

Ignoring subagent failures: Silent failures lead to incomplete or incorrect final outputs. Always validate subagent results before synthesis.

Share this article
N

NYC News

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.