Why Single-Call AI Is Not Enough

Most AI integrations start as a single API call: user sends input, model returns output, done. But real business processes are multi-step. Reviewing a contract involves extracting clauses, checking against policy, flagging risks, and generating a summary. Onboarding a customer requires validating documents, running compliance checks, creating accounts, and sending notifications.

Orchestrating Claude across multi-step workflows is the difference between "AI feature" and "AI-powered system." The challenge is not making individual calls, it is managing state, handling failures, and coordinating parallel and sequential steps efficiently.

The Four Orchestration Patterns

Pattern 1: Sequential Chain

The simplest pattern. Each step's output feeds into the next step's input.

flowchart TD
    START["Multi-Step AI Workflows: Orchestrating Claude Acr…"] --> A
    A["Why Single-Call AI Is Not Enough"]
    A --> B
    B["The Four Orchestration Patterns"]
    B --> C
    C["Error Handling and Retry Strategies"]
    C --> D
    D["Cost Optimization: Model Routing Per St…"]
    D --> E
    E["Summary"]
    E --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff

import anthropic
from dataclasses import dataclass

client = anthropic.Anthropic()

@dataclass
class StepResult:
    step_name: str
    output: str
    tokens_used: int
    model: str

async def sequential_chain(document: str) -> list[StepResult]:
    """Process a document through a sequential analysis chain."""
    results = []

    # Step 1: Extract key information
    extraction = client.messages.create(
        model="claude-haiku-4-20250514",  # Fast model for extraction
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"Extract all dates, names, monetary amounts, and "
                       f"obligations from this document:\n\n{document}"
        }]
    )
    results.append(StepResult(
        step_name="extraction",
        output=extraction.content[0].text,
        tokens_used=extraction.usage.output_tokens,
        model="claude-haiku-4-20250514"
    ))

    # Step 2: Analyze risks (uses extraction output)
    risk_analysis = client.messages.create(
        model="claude-sonnet-4-20250514",  # Stronger model for analysis
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": f"Given these extracted elements:\n{extraction.content[0].text}"
                       f"\n\nIdentify potential risks, ambiguities, and "
                       f"missing clauses in this contract."
        }]
    )
    results.append(StepResult(
        step_name="risk_analysis",
        output=risk_analysis.content[0].text,
        tokens_used=risk_analysis.usage.output_tokens,
        model="claude-sonnet-4-20250514"
    ))

    # Step 3: Generate summary (uses both previous outputs)
    summary = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"Create an executive summary of this contract review."
                       f"\n\nExtracted elements:\n{extraction.content[0].text}"
                       f"\n\nRisk analysis:\n{risk_analysis.content[0].text}"
        }]
    )
    results.append(StepResult(
        step_name="summary",
        output=summary.content[0].text,
        tokens_used=summary.usage.output_tokens,
        model="claude-sonnet-4-20250514"
    ))

    return results

When to use: Tasks with clear linear dependencies where each step requires the previous step's output.

Pattern 2: Parallel Fan-Out / Fan-In

When multiple independent analyses can run simultaneously, fan-out to parallel calls and fan-in to combine results.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Try Live Demo ROI Calculator

import asyncio
from anthropic import AsyncAnthropic

async_client = AsyncAnthropic()

async def parallel_analysis(document: str) -> dict:
    """Run multiple independent analyses in parallel."""

    async def analyze(aspect: str, instructions: str) -> tuple[str, str]:
        response = await async_client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            messages=[{
                "role": "user",
                "content": f"{instructions}\n\nDocument:\n{document}"
            }]
        )
        return aspect, response.content[0].text

    # Fan-out: run all analyses concurrently
    tasks = [
        analyze("legal", "Identify all legal obligations and liabilities."),
        analyze("financial", "Extract and analyze all financial terms."),
        analyze("compliance", "Check for regulatory compliance issues."),
        analyze("timeline", "Extract all deadlines and milestones."),
    ]
    results = await asyncio.gather(*tasks)

    # Fan-in: combine results
    analysis_map = dict(results)

    # Synthesis step: combine all parallel results
    synthesis = await async_client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": (
                "Synthesize these analyses into a unified report:\n\n"
                + "\n\n".join(
                    f"## {k.title()} Analysis\n{v}"
                    for k, v in analysis_map.items()
                )
            )
        }]
    )

    return {
        "individual_analyses": analysis_map,
        "synthesis": synthesis.content[0].text
    }

When to use: Multiple independent analyses of the same input, where a final synthesis step combines the results.

Pattern 3: Conditional Branching

Different inputs require different processing paths. A routing step decides which branch to execute.

import json

async def conditional_workflow(user_request: str) -> dict:
    """Route and process requests based on AI classification."""

    # Step 1: Classify the request
    classification = await async_client.messages.create(
        model="claude-haiku-4-20250514",
        max_tokens=256,
        messages=[{
            "role": "user",
            "content": f"""Classify this request into exactly one category.
Categories: billing, technical_support, account_change, general_inquiry

Request: {user_request}

Respond with JSON: {{"category": "...", "confidence": 0.0-1.0}}"""
        }]
    )

    route = json.loads(classification.content[0].text)

    # Step 2: Branch based on classification
    branch_configs = {
        "billing": {
            "model": "claude-sonnet-4-20250514",
            "system": "You are a billing specialist. Access account data via tools.",
            "tools": billing_tools,
        },
        "technical_support": {
            "model": "claude-sonnet-4-20250514",
            "system": "You are a technical support engineer. Diagnose and resolve issues.",
            "tools": tech_support_tools,
        },
        "account_change": {
            "model": "claude-sonnet-4-20250514",
            "system": "You are an account manager. Process account modifications.",
            "tools": account_tools,
        },
        "general_inquiry": {
            "model": "claude-haiku-4-20250514",
            "system": "You are a helpful assistant. Answer general questions.",
            "tools": [],
        },
    }

    config = branch_configs.get(route["category"], branch_configs["general_inquiry"])

    # Step 3: Execute the appropriate branch
    response = await async_client.messages.create(
        model=config["model"],
        system=config["system"],
        max_tokens=4096,
        tools=config["tools"],
        messages=[{"role": "user", "content": user_request}]
    )

    return {
        "classification": route,
        "response": response.content[0].text,
        "branch_used": route["category"]
    }

Pattern 4: Human-in-the-Loop Checkpoint

For high-stakes workflows, insert approval gates where a human reviews the AI's work before proceeding.

from enum import Enum

class ApprovalStatus(Enum):
    PENDING = "pending"
    APPROVED = "approved"
    REJECTED = "rejected"
    MODIFIED = "modified"

async def workflow_with_checkpoints(task: str) -> dict:
    """Execute a workflow with human approval checkpoints."""

    # Step 1: AI generates a plan
    plan = await async_client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": f"Create a detailed execution plan for: {task}\n"
                       f"List each step with expected outcomes and risks."
        }]
    )

    # Checkpoint: save plan and wait for human approval
    checkpoint_id = await save_checkpoint(
        stage="plan_review",
        content=plan.content[0].text,
        requires_approval=True
    )

    # In production, this would be async (webhook, polling, queue)
    approval = await wait_for_approval(checkpoint_id)

    if approval.status == ApprovalStatus.REJECTED:
        return {"status": "rejected", "reason": approval.feedback}

    # Use the potentially modified plan
    approved_plan = approval.modified_content or plan.content[0].text

    # Step 2: Execute the approved plan
    execution = await async_client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=8096,
        messages=[{
            "role": "user",
            "content": f"Execute this approved plan:\n{approved_plan}"
        }]
    )

    return {"status": "completed", "result": execution.content[0].text}

Error Handling and Retry Strategies

Multi-step workflows need robust error handling because any step can fail.

import time
from anthropic import APIError, RateLimitError

async def resilient_step(
    messages: list,
    model: str = "claude-sonnet-4-20250514",
    max_retries: int = 3,
    fallback_model: str = "claude-haiku-4-20250514"
) -> str:
    """Execute a step with retries and model fallback."""
    for attempt in range(max_retries):
        try:
            response = await async_client.messages.create(
                model=model,
                max_tokens=4096,
                messages=messages
            )
            return response.content[0].text

        except RateLimitError:
            wait_time = 2 ** attempt  # Exponential backoff
            time.sleep(wait_time)

        except APIError as e:
            if attempt == max_retries - 1 and fallback_model:
                # Last resort: try a different model
                response = await async_client.messages.create(
                    model=fallback_model,
                    max_tokens=4096,
                    messages=messages
                )
                return response.content[0].text
            raise

    raise RuntimeError(f"Step failed after {max_retries} retries")

Cost Optimization: Model Routing Per Step

One of the biggest advantages of multi-step workflows is using the right model for each step. Not every step needs the most capable model.

Step Type	Recommended Model	Why
Classification / routing	Haiku	Fast, cheap, highly accurate for simple decisions
Data extraction	Haiku or Sonnet	Structured extraction is well-handled by smaller models
Complex analysis	Sonnet	Good balance of capability and cost
Critical decisions	Opus	Highest accuracy for high-stakes reasoning
Synthesis / writing	Sonnet	Strong writing quality at reasonable cost

A typical workflow using model routing costs 40-60% less than using Sonnet for every step, with no measurable quality degradation.

Summary

Multi-step AI workflows transform Claude from a question-answering tool into a process automation engine. The four core patterns, sequential chains, parallel fan-out, conditional branching, and human-in-the-loop, can be combined to model almost any business process. The keys to production success are robust error handling with fallbacks, model routing for cost optimization, and checkpoint-based human oversight for high-stakes decisions.

Multi-Step AI Workflows: Orchestrating Claude Across Complex Tasks

Why Single-Call AI Is Not Enough

The Four Orchestration Patterns

Pattern 1: Sequential Chain

Pattern 2: Parallel Fan-Out / Fan-In

Pattern 3: Conditional Branching

Pattern 4: Human-in-the-Loop Checkpoint

Error Handling and Retry Strategies

Cost Optimization: Model Routing Per Step

Summary

Try CallSphere AI Voice Agents

Related Articles

The Context Window Challenge in Multi-Agent Systems: Managing Token Explosion | CallSphere Blog

High-Throughput Inference for AI Agents: Architecture Patterns That Scale | CallSphere Blog

Building Reliable Tool-Calling AI Agents: From Prototype to Production | CallSphere Blog