The Chain of Responsibility Pattern: Cascading Agent Attempts Until Success

What Is the Chain of Responsibility?

The Chain of Responsibility pattern passes a request along a chain of handlers. Each handler examines the request and either processes it or passes it to the next handler in the chain. The request travels down the chain until a handler successfully processes it, or the chain is exhausted.

In AI agent systems, this pattern is invaluable for building fallback chains. You might try a fast, cheap model first, fall back to a more capable model if the first one fails, and escalate to a specialized agent or human as a last resort. Each link in the chain can also check whether it has the right capabilities before attempting to handle the request.

Core Implementation

from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any


@dataclass
class Request:
    content: str
    required_capabilities: set[str]
    metadata: dict


@dataclass
class Response:
    content: str
    handler_name: str
    success: bool
    cost: float  # estimated cost in USD


class AgentHandler(ABC):
    def __init__(self, name: str, capabilities: set[str],
                 cost_per_call: float):
        self.name = name
        self.capabilities = capabilities
        self.cost_per_call = cost_per_call
        self._next: AgentHandler | None = None

    def set_next(self, handler: "AgentHandler") -> "AgentHandler":
        self._next = handler
        return handler

    def can_handle(self, request: Request) -> bool:
        return request.required_capabilities.issubset(
            self.capabilities
        )

    def handle(self, request: Request) -> Response | None:
        if self.can_handle(request):
            try:
                result = self.process(request)
                if result.success:
                    return result
            except Exception as e:
                print(f"{self.name} failed: {e}")

        if self._next:
            print(f"{self.name} passing to {self._next.name}")
            return self._next.handle(request)

        return None

    @abstractmethod
    def process(self, request: Request) -> Response:
        pass

Building Concrete Handlers

import openai


class LightweightAgent(AgentHandler):
    def __init__(self):
        super().__init__(
            name="GPT-4o-mini",
            capabilities={"text_generation", "summarization",
                          "classification"},
            cost_per_call=0.001,
        )
        self.client = openai.OpenAI()

    def process(self, request: Request) -> Response:
        response = self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": request.content}],
        )
        content = response.choices[0].message.content
        # Simple quality check
        if len(content) < 20:
            return Response(content, self.name, success=False,
                            cost=self.cost_per_call)
        return Response(content, self.name, success=True,
                        cost=self.cost_per_call)


class PowerfulAgent(AgentHandler):
    def __init__(self):
        super().__init__(
            name="GPT-4o",
            capabilities={"text_generation", "summarization",
                          "classification", "reasoning",
                          "code_generation"},
            cost_per_call=0.01,
        )
        self.client = openai.OpenAI()

    def process(self, request: Request) -> Response:
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": request.content}],
        )
        return Response(
            response.choices[0].message.content,
            self.name, success=True,
            cost=self.cost_per_call,
        )


class HumanEscalation(AgentHandler):
    def __init__(self):
        super().__init__(
            name="Human Reviewer",
            capabilities={"text_generation", "summarization",
                          "classification", "reasoning",
                          "code_generation", "human_judgment"},
            cost_per_call=5.0,
        )

    def process(self, request: Request) -> Response:
        # In production, this would create a ticket or send
        # a notification to a human review queue
        return Response(
            content="[Escalated to human review queue]",
            handler_name=self.name,
            success=True,
            cost=self.cost_per_call,
        )

Assembling the Chain

def build_cost_optimized_chain() -> AgentHandler:
    lightweight = LightweightAgent()
    powerful = PowerfulAgent()
    human = HumanEscalation()

    # Chain: cheap -> expensive -> human
    lightweight.set_next(powerful)
    powerful.set_next(human)

    return lightweight


chain = build_cost_optimized_chain()

# Simple request — handled by lightweight agent
simple = Request(
    content="Summarize this paragraph in one sentence.",
    required_capabilities={"summarization"},
    metadata={},
)
result = chain.handle(simple)
print(f"Handled by: {result.handler_name}, Cost: ${result.cost}")

# Complex request — needs reasoning, skips to powerful agent
complex_req = Request(
    content="Analyze the time complexity of this algorithm.",
    required_capabilities={"reasoning", "code_generation"},
    metadata={},
)
result = chain.handle(complex_req)
print(f"Handled by: {result.handler_name}, Cost: ${result.cost}")

The capability check in can_handle means the chain intelligently skips handlers that lack the required capabilities, so a request needing reasoning jumps straight to GPT-4o without wasting a call on GPT-4o-mini.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

FAQ

How do I order the handlers for cost efficiency?

Place the cheapest handler first and the most expensive last. This ensures simple requests are handled cheaply while complex requests still get resolved. Track the percentage of requests handled at each level to monitor whether your chain ordering is optimal.

What if I want to try all handlers and pick the best result?

That is a different pattern — closer to Map-Reduce or an ensemble. The Chain of Responsibility is specifically designed for "first success wins" semantics. If you need to compare outputs from multiple agents, use a fan-out approach and a separate evaluator to pick the best.

How do I handle the case where no handler in the chain can process a request?

The handle method returns None when the chain is exhausted. Wrap the chain call in logic that detects this and returns a graceful error to the user, such as "We could not process your request. A support ticket has been created."

#AgentDesignPatterns #ChainOfResponsibility #Python #AgenticAI #FaultTolerance #LearnAI #AIEngineering

The Chain of Responsibility Pattern: Cascading Agent Attempts Until Success

What Is the Chain of Responsibility?

Core Implementation

Building Concrete Handlers

Assembling the Chain

FAQ

How do I order the handlers for cost efficiency?

What if I want to try all handlers and pick the best result?

How do I handle the case where no handler in the chain can process a request?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding