AI Agent Reliability Patterns: Retries, Fallbacks, and Circuit Breakers for Production Agents
How to build reliable AI agents using battle-tested distributed systems patterns: retry strategies, fallback chains, circuit breakers, and graceful degradation.
Agents Fail. The Question Is How Gracefully.
AI agents in production face a constant stream of failures: API rate limits, tool execution errors, malformed LLM outputs, timeout on external services, and model hallucinations that derail multi-step plans. The difference between a demo agent and a production agent is not capability -- it is reliability engineering.
The good news is that decades of distributed systems engineering have produced patterns that apply directly to agent systems.
Pattern 1: Structured Retries
Not all failures are equal. Your retry strategy should match the failure type:
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
@retry(
retry=retry_if_exception_type((RateLimitError, TimeoutError)),
wait=wait_exponential(multiplier=1, min=1, max=60),
stop=stop_after_attempt(5),
before_sleep=log_retry_attempt
)
async def call_llm(messages, tools):
return await client.messages.create(
model="claude-sonnet-4-20250514",
messages=messages,
tools=tools
)
Key principles:
- Exponential backoff: Prevents thundering herd on rate limits
- Jitter: Add random jitter to prevent synchronized retries from multiple agents
- Selective retry: Only retry transient errors (rate limits, timeouts). Do not retry on invalid requests or authentication failures
- Maximum attempts: Always cap retries to prevent infinite loops
Pattern 2: Model Fallback Chains
When your primary model is unavailable or degraded, fall back to alternatives:
MODEL_CHAIN = [
{"model": "claude-sonnet-4-20250514", "provider": "anthropic"},
{"model": "gpt-4o", "provider": "openai"},
{"model": "claude-haiku-4-20250514", "provider": "anthropic"}, # Cheaper, faster, less capable
]
async def resilient_llm_call(messages, tools):
for model_config in MODEL_CHAIN:
try:
return await call_provider(
model=model_config["model"],
provider=model_config["provider"],
messages=messages,
tools=tools
)
except (ServiceUnavailableError, RateLimitError) as e:
logger.warning(f"Fallback from {model_config['model']}: {e}")
continue
raise AllModelsUnavailableError("Exhausted all model fallbacks")
Important considerations:
- Prompts may need adjustment for different models (tool schemas, system prompt format)
- Track which model actually served each request for quality monitoring
- Quality may degrade with fallback models -- alert when the primary model has been unavailable for extended periods
Pattern 3: Circuit Breakers
Prevent cascading failures by stopping calls to a failing service:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.failure_count = 0
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.state = "CLOSED" # CLOSED = normal, OPEN = blocking, HALF_OPEN = testing
self.last_failure_time = None
async def call(self, func, *args, **kwargs):
if self.state == "OPEN":
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = "HALF_OPEN"
else:
raise CircuitOpenError("Circuit breaker is open")
try:
result = await func(*args, **kwargs)
if self.state == "HALF_OPEN":
self.state = "CLOSED"
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = "OPEN"
raise
Use separate circuit breakers for each external dependency (LLM provider, tool APIs, databases).
Pattern 4: Idempotent Tool Execution
Agent tools must be safe to retry. If a tool call times out, the agent (or retry logic) may call it again. Non-idempotent tools can cause double-charges, duplicate records, or other side effects.
Design principles:
- Use idempotency keys for operations that create or modify resources
- Make read operations naturally idempotent
- Log tool execution results and check for existing results before re-executing
- Use database transactions with unique constraints to prevent duplicates
Pattern 5: Graceful Degradation
When full functionality is unavailable, provide reduced but useful service:
- Tool failure: If a search tool fails, the agent can still answer from its parametric knowledge (with appropriate caveats)
- Context retrieval failure: If RAG retrieval fails, fall back to a general response with a disclaimer
- Timeout: If the agent cannot complete a complex task within the time budget, return partial results with an explanation
Pattern 6: Checkpointing for Long-Running Agents
Agents that run for minutes or hours should checkpoint their state:
class CheckpointedAgent:
async def run(self, task):
checkpoint = await self.load_checkpoint(task.id)
for step in self.plan(task, resume_from=checkpoint):
result = await self.execute_step(step)
await self.save_checkpoint(task.id, step, result)
if result.failed and not result.retryable:
return self.partial_result(task.id)
return self.final_result(task.id)
If the agent crashes or the process restarts, it resumes from the last checkpoint instead of starting over.
Measuring Reliability
Track these metrics to quantify agent reliability:
- Task completion rate: Percentage of tasks completed successfully
- Mean time to completion: Average wall-clock time per task
- Retry rate: How often retries are needed (high rates indicate systemic issues)
- Fallback rate: How often the primary model/tool is unavailable
- Error categorization: Breakdown of failures by type (rate limit, timeout, parsing, tool error)
Sources: Microsoft Release It! Patterns | Anthropic Agent Reliability | AWS Well-Architected Framework
NYC News
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.