Claude's Orchestrator and Subagent Model Explained
Deep dive into the orchestrator-subagent architecture pattern used in Claude Code and the Claude Agent SDK. Learn how task decomposition, delegation, and result synthesis work under the hood.
The Architecture Behind Claude Code's Power
When Claude Code tackles a complex task like "refactor this module to use dependency injection and update all tests," it does not attempt everything in a single reasoning chain. Instead, it uses an orchestrator-subagent model where a primary agent decomposes the work, delegates pieces to focused subagents, and synthesizes the results.
This pattern is now directly available through the Claude Agent SDK, and understanding it is essential for building production-grade agentic applications.
How the Orchestrator-Subagent Model Works
The model operates in four phases:
Phase 1: Task Decomposition
The orchestrator agent receives the user's request and breaks it into discrete, parallelizable subtasks. Each subtask has a clear objective, input specification, and expected output format.
from anthropic import Anthropic
client = Anthropic()
ORCHESTRATOR_SYSTEM = """You are a task orchestrator. Given a complex request:
1. Break it into independent subtasks (max 5)
2. For each subtask, specify:
- objective: what to accomplish
- context: what information the subagent needs
- output_format: expected response structure
- model_tier: haiku, sonnet, or opus based on complexity
3. Identify dependencies between subtasks
4. Return a JSON execution plan."""
def decompose_task(user_request: str) -> dict:
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=4096,
system=ORCHESTRATOR_SYSTEM,
messages=[{"role": "user", "content": user_request}]
)
return parse_execution_plan(response.content[0].text)
Phase 2: Subagent Dispatch
The orchestrator spawns subagents for each subtask. Subagents are lightweight -- they have a focused system prompt, a constrained toolset, and a single objective. This constraint is a feature, not a limitation: it prevents subagents from going off-task and keeps token usage predictable.
import asyncio
SUBAGENT_CONFIGS = {
"analyzer": {
"system": "You analyze code structure and report findings in structured JSON.",
"tools": ["Read", "Glob", "Grep"],
"model": "claude-sonnet-4-5-20250514",
"max_tokens": 4096,
},
"implementer": {
"system": "You implement code changes precisely as specified. Write clean, tested code.",
"tools": ["Read", "Write", "Edit", "Bash"],
"model": "claude-sonnet-4-5-20250514",
"max_tokens": 8192,
},
"tester": {
"system": "You write and run tests. Report pass/fail status with details.",
"tools": ["Read", "Write", "Bash"],
"model": "claude-sonnet-4-5-20250514",
"max_tokens": 4096,
},
"reviewer": {
"system": "You review code for bugs, security issues, and style violations.",
"tools": ["Read", "Glob", "Grep"],
"model": "claude-sonnet-4-5-20250514",
"max_tokens": 4096,
},
}
async def spawn_subagent(config_name: str, task: str) -> dict:
config = SUBAGENT_CONFIGS[config_name]
response = client.messages.create(
model=config["model"],
max_tokens=config["max_tokens"],
system=config["system"],
messages=[{"role": "user", "content": task}]
)
return {
"agent": config_name,
"result": response.content[0].text,
"tokens_used": response.usage.input_tokens + response.usage.output_tokens,
}
Phase 3: Dependency Resolution and Execution
Not all subtasks can run in parallel. The orchestrator respects dependency ordering:
async def execute_plan(plan: dict) -> list[dict]:
results = {}
for phase in plan["phases"]:
# Run all tasks in this phase concurrently
phase_tasks = []
for subtask in phase["tasks"]:
# Inject results from prior phases into context
context = subtask["context"]
for dep in subtask.get("dependencies", []):
context += f"\n\nResult from {dep}:\n{results[dep]['result']}"
phase_tasks.append(
spawn_subagent(subtask["agent_type"], context)
)
phase_results = await asyncio.gather(*phase_tasks)
for subtask, result in zip(phase["tasks"], phase_results):
results[subtask["id"]] = result
return list(results.values())
Phase 4: Result Synthesis
The orchestrator reviews all subagent results and produces a coherent final output. This is where the orchestrator adds the most value -- it resolves conflicts between subagent outputs, fills gaps, and presents a unified result.
SYNTHESIS_SYSTEM = """You are a synthesis agent. Given results from multiple
specialist agents, produce a single coherent response that:
1. Integrates all findings without duplication
2. Resolves any conflicts between agents (explain your reasoning)
3. Highlights areas of uncertainty or disagreement
4. Provides a clear, actionable summary"""
def synthesize(original_request: str, agent_results: list[dict]) -> str:
formatted_results = "\n\n".join([
f"=== {r['agent']} ===\n{r['result']}" for r in agent_results
])
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=8192,
system=SYNTHESIS_SYSTEM,
messages=[{
"role": "user",
"content": f"Original request: {original_request}\n\nAgent results:\n{formatted_results}"
}]
)
return response.content[0].text
Real-World Example: Automated PR Review Pipeline
Here is how a production PR review system uses the orchestrator-subagent model:
- Orchestrator receives a pull request diff
- Analyzer subagent maps the changed files and identifies affected modules
- Security reviewer subagent scans for vulnerability patterns (SQL injection, XSS, auth bypasses)
- Logic reviewer subagent checks for bugs, edge cases, and race conditions
- Style reviewer subagent verifies coding standards and consistency
- Test coverage subagent checks if new code has adequate test coverage
- Orchestrator synthesizes all reviews into a single, prioritized feedback document
This pipeline processes a 500-line PR in under 30 seconds with five parallel subagents, compared to 2-3 minutes with a single sequential agent.
Orchestrator Design Principles
Principle 1: Minimal Context Per Subagent
Give each subagent only the information it needs. A security reviewer does not need the full project history -- it needs the diff and the security policy. Smaller context means faster responses, lower costs, and less chance of distraction.
Principle 2: Typed Contracts Between Agents
Define explicit input/output schemas for each agent. When the analyzer outputs a JSON structure, the implementer should expect exactly that structure. Type mismatches between agents are the most common source of multi-agent bugs.
Principle 3: Idempotent Subagents
Design subagents so that running them twice with the same input produces the same output. This makes retry logic safe and debugging reproducible.
Principle 4: Fail-Fast with Graceful Degradation
If a subagent fails, the orchestrator should decide whether to retry, skip, or substitute a default. Not every subtask is critical -- a failed style review should not block a security review.
Cost Analysis
For a typical orchestrator + 4 subagent workflow:
| Component | Model | Input Tokens | Output Tokens | Cost |
|---|---|---|---|---|
| Orchestrator (decompose) | Sonnet | 2,000 | 800 | $0.018 |
| Subagent 1 (analyze) | Haiku | 3,000 | 1,000 | $0.006 |
| Subagent 2 (implement) | Sonnet | 5,000 | 3,000 | $0.060 |
| Subagent 3 (test) | Sonnet | 4,000 | 2,000 | $0.042 |
| Subagent 4 (review) | Haiku | 4,000 | 1,500 | $0.012 |
| Orchestrator (synthesize) | Sonnet | 8,000 | 2,000 | $0.054 |
| Total | 26,000 | 10,300 | $0.192 |
This is roughly the same cost as a single long agent session, but the work completes in one-third of the wall-clock time due to parallelism.
Anti-Patterns to Avoid
Over-decomposition: Breaking a simple task into five subtasks when one agent could handle it adds latency and cost without benefit.
Circular dependencies: If Agent A needs Agent B's output and Agent B needs Agent A's output, the system deadlocks. Design acyclic dependency graphs.
Orchestrator as bottleneck: If the orchestrator does too much work itself, you lose the benefits of delegation. The orchestrator should decompose, delegate, and synthesize -- not execute.
Ignoring subagent failures: Silent failures lead to incomplete or incorrect final outputs. Always validate subagent results before synthesis.
NYC News
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.