Prompt Chaining: Breaking Complex Tasks into Sequential LLM Calls

Why Single Prompts Are Not Enough

As tasks grow in complexity, single prompts become unreliable. Asking an LLM to simultaneously analyze data, generate a report, and format it as a structured document invites errors at every level. Prompt chaining solves this by decomposing complex tasks into a sequence of focused LLM calls, where each call handles one well-defined step and passes its output to the next.

This is analogous to Unix pipes — small, composable operations chained together to accomplish complex workflows.

Basic Chain Pattern

The simplest chain passes the output of one call as input to the next:

from openai import OpenAI

client = OpenAI()


def llm_call(system: str, user: str, model: str = "gpt-4o") -> str:
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": user},
        ]
    )
    return response.choices[0].message.content


def analyze_and_report(raw_data: str) -> dict:
    # Step 1: Extract key metrics
    metrics = llm_call(
        system="Extract numerical metrics from the data. Return as a bullet list of metric: value pairs.",
        user=raw_data
    )

    # Step 2: Analyze trends
    analysis = llm_call(
        system="You are a data analyst. Analyze the metrics for trends, anomalies, and insights.",
        user=f"Metrics:\n{metrics}"
    )

    # Step 3: Generate executive summary
    summary = llm_call(
        system="Write a 3-sentence executive summary for a non-technical audience.",
        user=f"Analysis:\n{analysis}"
    )

    return {
        "metrics": metrics,
        "analysis": analysis,
        "summary": summary,
    }

Each step has a narrow, clearly defined task. The extraction step does not need to analyze. The analysis step does not need to format for executives. This separation produces better results at every stage.

Building a Chain Pipeline Class

For production systems, formalize chains with a pipeline abstraction:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from dataclasses import dataclass
from typing import Callable


@dataclass
class ChainStep:
    name: str
    system_prompt: str
    input_formatter: Callable[[dict], str]
    output_key: str
    model: str = "gpt-4o"


class PromptChain:
    def __init__(self, steps: list[ChainStep]):
        self.steps = steps
        self.client = OpenAI()

    def run(self, initial_input: str) -> dict:
        context = {"initial_input": initial_input}

        for step in self.steps:
            user_message = step.input_formatter(context)

            response = self.client.chat.completions.create(
                model=step.model,
                messages=[
                    {"role": "system", "content": step.system_prompt},
                    {"role": "user", "content": user_message},
                ]
            )

            result = response.choices[0].message.content
            context[step.output_key] = result
            print(f"[{step.name}] completed -> {len(result)} chars")

        return context


# Define a review pipeline
review_chain = PromptChain([
    ChainStep(
        name="extract_code",
        system_prompt="Extract all code blocks from the pull request description. Return only the code.",
        input_formatter=lambda ctx: ctx["initial_input"],
        output_key="code",
    ),
    ChainStep(
        name="find_issues",
        system_prompt="Review the code for bugs, security issues, and performance problems. List each issue.",
        input_formatter=lambda ctx: ctx["code"],
        output_key="issues",
    ),
    ChainStep(
        name="format_review",
        system_prompt="Format the code review issues as a GitHub review comment with severity labels.",
        input_formatter=lambda ctx: f"Issues found:\n{ctx['issues']}",
        output_key="review",
    ),
])

results = review_chain.run(pr_description)
print(results["review"])

Error Handling in Chains

A chain is only as strong as its weakest link. Build error handling into the pipeline:

import logging

logger = logging.getLogger(__name__)


class ResilientChain:
    def __init__(self, steps: list[ChainStep], max_retries: int = 2):
        self.steps = steps
        self.max_retries = max_retries
        self.client = OpenAI()

    def _execute_step(self, step: ChainStep, user_message: str) -> str:
        for attempt in range(self.max_retries + 1):
            try:
                response = self.client.chat.completions.create(
                    model=step.model,
                    messages=[
                        {"role": "system", "content": step.system_prompt},
                        {"role": "user", "content": user_message},
                    ]
                )
                result = response.choices[0].message.content
                if not result or not result.strip():
                    raise ValueError("Empty response from LLM")
                return result
            except Exception as e:
                logger.warning(
                    f"Step '{step.name}' attempt {attempt + 1} failed: {e}"
                )
                if attempt == self.max_retries:
                    raise RuntimeError(
                        f"Step '{step.name}' failed after {self.max_retries + 1} attempts"
                    ) from e

    def run(self, initial_input: str) -> dict:
        context = {"initial_input": initial_input}

        for i, step in enumerate(self.steps):
            try:
                user_message = step.input_formatter(context)
                context[step.output_key] = self._execute_step(step, user_message)
            except RuntimeError as e:
                logger.error(f"Chain failed at step {i} ({step.name}): {e}")
                context["error"] = str(e)
                context["failed_step"] = step.name
                break

        return context

Conditional Branching

Not all chains are linear. Sometimes you need to branch based on intermediate results:

async def classify_and_route(customer_message: str) -> str:
    # Step 1: Classify the intent
    intent = llm_call(
        system="Classify the customer message as: billing, technical, general, or urgent. Return only the category.",
        user=customer_message
    ).strip().lower()

    # Step 2: Route to specialized prompt based on classification
    specialized_prompts = {
        "billing": "You are a billing specialist. Help resolve payment and subscription issues.",
        "technical": "You are a senior support engineer. Diagnose and solve technical problems.",
        "urgent": "You are an escalation handler. Acknowledge the urgency, gather details, and create a priority ticket.",
        "general": "You are a friendly support agent. Answer general questions about our product.",
    }

    system = specialized_prompts.get(intent, specialized_prompts["general"])

    # Step 3: Generate the response with the specialized persona
    response = llm_call(system=system, user=customer_message)
    return response

This pattern — classify first, then route — is fundamental to building agentic systems. Each branch can use a different model, temperature, or even a different prompt chain.

FAQ

How many steps should a prompt chain have?

Keep chains to 2-5 steps. Each step adds latency and the risk of error compounding. If your chain has more than 5 steps, consider whether some steps can be combined or whether a single well-crafted prompt could replace part of the chain.

How do I debug a failing chain?

Log the full input and output of every step. When a chain produces bad results, inspect each step's output to find where quality degrades. Often the issue is in the input formatting between steps — the output of step N does not match what step N+1 expects.

Is prompt chaining the same as using agents with tools?

No. Prompt chaining is a predefined sequence of calls that you design. Agent tool use is dynamic — the model decides at runtime which tools to call and in what order. Chains are simpler, more predictable, and easier to debug. Use chains when the workflow is known; use agents when the workflow must be discovered.

#PromptChaining #PipelineDesign #LLMOrchestration #PromptEngineering #Python #AgenticAI #LearnAI #AIEngineering

Prompt Chaining: Breaking Complex Tasks into Sequential LLM Calls

Why Single Prompts Are Not Enough

Basic Chain Pattern

Building a Chain Pipeline Class

Error Handling in Chains

Conditional Branching

FAQ

How many steps should a prompt chain have?

How do I debug a failing chain?

Is prompt chaining the same as using agents with tools?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding