Multi-Step AI Workflows: Orchestrating Claude Across Complex Tasks
Learn patterns for orchestrating Claude across multi-step workflows including sequential chains, parallel fan-out, conditional branching, and human-in-the-loop checkpoints. Includes production-ready Python examples.
Why Single-Call AI Is Not Enough
Most AI integrations start as a single API call: user sends input, model returns output, done. But real business processes are multi-step. Reviewing a contract involves extracting clauses, checking against policy, flagging risks, and generating a summary. Onboarding a customer requires validating documents, running compliance checks, creating accounts, and sending notifications.
Orchestrating Claude across multi-step workflows is the difference between "AI feature" and "AI-powered system." The challenge is not making individual calls, it is managing state, handling failures, and coordinating parallel and sequential steps efficiently.
The Four Orchestration Patterns
Pattern 1: Sequential Chain
The simplest pattern. Each step's output feeds into the next step's input.
import anthropic
from dataclasses import dataclass
client = anthropic.Anthropic()
@dataclass
class StepResult:
step_name: str
output: str
tokens_used: int
model: str
async def sequential_chain(document: str) -> list[StepResult]:
"""Process a document through a sequential analysis chain."""
results = []
# Step 1: Extract key information
extraction = client.messages.create(
model="claude-haiku-4-20250514", # Fast model for extraction
max_tokens=2048,
messages=[{
"role": "user",
"content": f"Extract all dates, names, monetary amounts, and "
f"obligations from this document:\n\n{document}"
}]
)
results.append(StepResult(
step_name="extraction",
output=extraction.content[0].text,
tokens_used=extraction.usage.output_tokens,
model="claude-haiku-4-20250514"
))
# Step 2: Analyze risks (uses extraction output)
risk_analysis = client.messages.create(
model="claude-sonnet-4-20250514", # Stronger model for analysis
max_tokens=4096,
messages=[{
"role": "user",
"content": f"Given these extracted elements:\n{extraction.content[0].text}"
f"\n\nIdentify potential risks, ambiguities, and "
f"missing clauses in this contract."
}]
)
results.append(StepResult(
step_name="risk_analysis",
output=risk_analysis.content[0].text,
tokens_used=risk_analysis.usage.output_tokens,
model="claude-sonnet-4-20250514"
))
# Step 3: Generate summary (uses both previous outputs)
summary = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"Create an executive summary of this contract review."
f"\n\nExtracted elements:\n{extraction.content[0].text}"
f"\n\nRisk analysis:\n{risk_analysis.content[0].text}"
}]
)
results.append(StepResult(
step_name="summary",
output=summary.content[0].text,
tokens_used=summary.usage.output_tokens,
model="claude-sonnet-4-20250514"
))
return results
When to use: Tasks with clear linear dependencies where each step requires the previous step's output.
Pattern 2: Parallel Fan-Out / Fan-In
When multiple independent analyses can run simultaneously, fan-out to parallel calls and fan-in to combine results.
import asyncio
from anthropic import AsyncAnthropic
async_client = AsyncAnthropic()
async def parallel_analysis(document: str) -> dict:
"""Run multiple independent analyses in parallel."""
async def analyze(aspect: str, instructions: str) -> tuple[str, str]:
response = await async_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"{instructions}\n\nDocument:\n{document}"
}]
)
return aspect, response.content[0].text
# Fan-out: run all analyses concurrently
tasks = [
analyze("legal", "Identify all legal obligations and liabilities."),
analyze("financial", "Extract and analyze all financial terms."),
analyze("compliance", "Check for regulatory compliance issues."),
analyze("timeline", "Extract all deadlines and milestones."),
]
results = await asyncio.gather(*tasks)
# Fan-in: combine results
analysis_map = dict(results)
# Synthesis step: combine all parallel results
synthesis = await async_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": (
"Synthesize these analyses into a unified report:\n\n"
+ "\n\n".join(
f"## {k.title()} Analysis\n{v}"
for k, v in analysis_map.items()
)
)
}]
)
return {
"individual_analyses": analysis_map,
"synthesis": synthesis.content[0].text
}
When to use: Multiple independent analyses of the same input, where a final synthesis step combines the results.
Pattern 3: Conditional Branching
Different inputs require different processing paths. A routing step decides which branch to execute.
import json
async def conditional_workflow(user_request: str) -> dict:
"""Route and process requests based on AI classification."""
# Step 1: Classify the request
classification = await async_client.messages.create(
model="claude-haiku-4-20250514",
max_tokens=256,
messages=[{
"role": "user",
"content": f"""Classify this request into exactly one category.
Categories: billing, technical_support, account_change, general_inquiry
Request: {user_request}
Respond with JSON: {{"category": "...", "confidence": 0.0-1.0}}"""
}]
)
route = json.loads(classification.content[0].text)
# Step 2: Branch based on classification
branch_configs = {
"billing": {
"model": "claude-sonnet-4-20250514",
"system": "You are a billing specialist. Access account data via tools.",
"tools": billing_tools,
},
"technical_support": {
"model": "claude-sonnet-4-20250514",
"system": "You are a technical support engineer. Diagnose and resolve issues.",
"tools": tech_support_tools,
},
"account_change": {
"model": "claude-sonnet-4-20250514",
"system": "You are an account manager. Process account modifications.",
"tools": account_tools,
},
"general_inquiry": {
"model": "claude-haiku-4-20250514",
"system": "You are a helpful assistant. Answer general questions.",
"tools": [],
},
}
config = branch_configs.get(route["category"], branch_configs["general_inquiry"])
# Step 3: Execute the appropriate branch
response = await async_client.messages.create(
model=config["model"],
system=config["system"],
max_tokens=4096,
tools=config["tools"],
messages=[{"role": "user", "content": user_request}]
)
return {
"classification": route,
"response": response.content[0].text,
"branch_used": route["category"]
}
Pattern 4: Human-in-the-Loop Checkpoint
For high-stakes workflows, insert approval gates where a human reviews the AI's work before proceeding.
from enum import Enum
class ApprovalStatus(Enum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"
MODIFIED = "modified"
async def workflow_with_checkpoints(task: str) -> dict:
"""Execute a workflow with human approval checkpoints."""
# Step 1: AI generates a plan
plan = await async_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": f"Create a detailed execution plan for: {task}\n"
f"List each step with expected outcomes and risks."
}]
)
# Checkpoint: save plan and wait for human approval
checkpoint_id = await save_checkpoint(
stage="plan_review",
content=plan.content[0].text,
requires_approval=True
)
# In production, this would be async (webhook, polling, queue)
approval = await wait_for_approval(checkpoint_id)
if approval.status == ApprovalStatus.REJECTED:
return {"status": "rejected", "reason": approval.feedback}
# Use the potentially modified plan
approved_plan = approval.modified_content or plan.content[0].text
# Step 2: Execute the approved plan
execution = await async_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8096,
messages=[{
"role": "user",
"content": f"Execute this approved plan:\n{approved_plan}"
}]
)
return {"status": "completed", "result": execution.content[0].text}
Error Handling and Retry Strategies
Multi-step workflows need robust error handling because any step can fail.
import time
from anthropic import APIError, RateLimitError
async def resilient_step(
messages: list,
model: str = "claude-sonnet-4-20250514",
max_retries: int = 3,
fallback_model: str = "claude-haiku-4-20250514"
) -> str:
"""Execute a step with retries and model fallback."""
for attempt in range(max_retries):
try:
response = await async_client.messages.create(
model=model,
max_tokens=4096,
messages=messages
)
return response.content[0].text
except RateLimitError:
wait_time = 2 ** attempt # Exponential backoff
time.sleep(wait_time)
except APIError as e:
if attempt == max_retries - 1 and fallback_model:
# Last resort: try a different model
response = await async_client.messages.create(
model=fallback_model,
max_tokens=4096,
messages=messages
)
return response.content[0].text
raise
raise RuntimeError(f"Step failed after {max_retries} retries")
Cost Optimization: Model Routing Per Step
One of the biggest advantages of multi-step workflows is using the right model for each step. Not every step needs the most capable model.
| Step Type | Recommended Model | Why |
|---|---|---|
| Classification / routing | Haiku | Fast, cheap, highly accurate for simple decisions |
| Data extraction | Haiku or Sonnet | Structured extraction is well-handled by smaller models |
| Complex analysis | Sonnet | Good balance of capability and cost |
| Critical decisions | Opus | Highest accuracy for high-stakes reasoning |
| Synthesis / writing | Sonnet | Strong writing quality at reasonable cost |
A typical workflow using model routing costs 40-60% less than using Sonnet for every step, with no measurable quality degradation.
Summary
Multi-step AI workflows transform Claude from a question-answering tool into a process automation engine. The four core patterns, sequential chains, parallel fan-out, conditional branching, and human-in-the-loop, can be combined to model almost any business process. The keys to production success are robust error handling with fallbacks, model routing for cost optimization, and checkpoint-based human oversight for high-stakes decisions.
NYC News
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.