Error Handling and Retry Patterns for Playwright AI Agents
Build resilient Playwright AI agents with comprehensive error handling for timeouts, missing elements, navigation failures, and network errors, plus retry decorators and graceful degradation strategies.
Why Error Handling Is Critical for Browser Automation Agents
Browser automation is inherently unreliable. Networks fail, pages load slowly, elements appear and disappear unpredictably, and websites deploy updates that change their DOM structure without warning. An AI agent that does not handle these failures gracefully will crash on its first encounter with the real web.
Production-grade Playwright agents need layered error handling: catching specific exceptions, implementing intelligent retry logic, providing fallback strategies, and logging sufficient context for debugging. This post covers patterns that make your agents resilient.
Playwright Exception Types
Playwright raises specific exception types that tell you exactly what went wrong:
from playwright.sync_api import (
sync_playwright,
TimeoutError as PlaywrightTimeout,
Error as PlaywrightError,
)
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
try:
page.goto("https://example.com", timeout=5000)
except PlaywrightTimeout:
print("Page took too long to load")
except PlaywrightError as e:
if "net::ERR_NAME_NOT_RESOLVED" in str(e):
print("DNS resolution failed — invalid domain")
elif "net::ERR_CONNECTION_REFUSED" in str(e):
print("Server refused the connection")
elif "net::ERR_CONNECTION_TIMED_OUT" in str(e):
print("Connection timed out at network level")
else:
print(f"Browser error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
finally:
browser.close()
The key exceptions to handle are:
- TimeoutError — element not found within timeout, page did not load
- Error with network messages — DNS, connection, SSL failures
- Error with element messages — element detached, not visible, not clickable
Handling Element Not Found
The most common failure in browser automation is trying to interact with an element that does not exist or is not ready:
def safe_click(page, selector: str, timeout: int = 5000) -> bool:
"""Click an element if it exists, return success status."""
try:
locator = page.locator(selector)
locator.wait_for(state="visible", timeout=timeout)
locator.click()
return True
except PlaywrightTimeout:
print(f"Element not found: {selector}")
return False
except PlaywrightError as e:
print(f"Cannot click {selector}: {e}")
return False
def safe_fill(page, selector: str, value: str, timeout: int = 5000) -> bool:
"""Fill a form field if it exists, return success status."""
try:
locator = page.locator(selector)
locator.wait_for(state="visible", timeout=timeout)
locator.fill(value)
return True
except PlaywrightTimeout:
print(f"Field not found: {selector}")
return False
def safe_text(page, selector: str, default: str = "") -> str:
"""Extract text content safely."""
try:
locator = page.locator(selector)
if locator.count() > 0:
return locator.first.text_content() or default
return default
except Exception:
return default
Building a Retry Decorator
A generic retry decorator that handles transient failures:
import time
import functools
from playwright.sync_api import TimeoutError as PlaywrightTimeout
def retry(
max_attempts: int = 3,
delay: float = 1.0,
backoff: float = 2.0,
exceptions: tuple = (PlaywrightTimeout, Exception),
):
"""
Retry decorator with exponential backoff.
Args:
max_attempts: Maximum number of attempts
delay: Initial delay between retries in seconds
backoff: Multiplier for delay after each retry
exceptions: Tuple of exception types to catch
"""
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
current_delay = delay
last_exception = None
for attempt in range(1, max_attempts + 1):
try:
return func(*args, **kwargs)
except exceptions as e:
last_exception = e
if attempt == max_attempts:
print(
f"[{func.__name__}] Failed after "
f"{max_attempts} attempts: {e}"
)
raise
print(
f"[{func.__name__}] Attempt {attempt} failed: {e}. "
f"Retrying in {current_delay:.1f}s..."
)
time.sleep(current_delay)
current_delay *= backoff
return wrapper
return decorator
# Usage
@retry(max_attempts=3, delay=2.0, backoff=2.0)
def navigate_and_extract(page, url: str) -> dict:
page.goto(url, wait_until="networkidle", timeout=10000)
return {
"title": page.title(),
"content": page.locator("main").text_content(),
}
Page-Level Retry with Fresh Context
Sometimes the page itself gets into a bad state. Retry with a fresh browser context:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from playwright.sync_api import sync_playwright
def robust_scrape(url: str, max_attempts: int = 3) -> dict:
"""Scrape a URL with retry logic that creates fresh contexts."""
with sync_playwright() as p:
browser = p.chromium.launch()
for attempt in range(1, max_attempts + 1):
context = browser.new_context()
page = context.new_page()
try:
page.goto(url, wait_until="networkidle", timeout=15000)
# Wait for content to be present
page.wait_for_selector("body", timeout=5000)
data = {
"url": url,
"title": page.title(),
"text": page.locator("body").text_content()[:5000],
"attempt": attempt,
}
return data
except Exception as e:
print(f"Attempt {attempt}/{max_attempts} failed: {e}")
if attempt == max_attempts:
return {"url": url, "error": str(e)}
finally:
context.close()
browser.close()
Graceful Degradation Pattern
When an agent cannot complete its primary task, fall back to progressively simpler strategies:
class ResilientAgent:
def __init__(self, browser):
self.browser = browser
def extract_product_data(self, url: str) -> dict:
"""
Try multiple strategies to extract product data,
degrading gracefully if preferred methods fail.
"""
context = self.browser.new_context()
page = context.new_page()
result = {"url": url, "strategy": None}
try:
page.goto(url, wait_until="networkidle", timeout=15000)
# Strategy 1: Structured data (JSON-LD)
try:
json_ld = page.locator(
'script[type="application/ld+json"]'
).text_content()
import json
data = json.loads(json_ld)
result.update({
"name": data.get("name"),
"price": data.get("offers", {}).get("price"),
"strategy": "json-ld",
})
return result
except Exception:
pass
# Strategy 2: Open Graph meta tags
try:
result.update({
"name": page.locator(
'meta[property="og:title"]'
).get_attribute("content"),
"price": None,
"strategy": "open-graph",
})
if result["name"]:
return result
except Exception:
pass
# Strategy 3: DOM selectors (least reliable)
try:
result.update({
"name": (
safe_text(page, "h1")
or safe_text(page, ".product-title")
),
"price": (
safe_text(page, ".price")
or safe_text(page, "[data-price]")
),
"strategy": "dom-selectors",
})
return result
except Exception:
pass
# Strategy 4: Take a screenshot for manual review
page.screenshot(path=f"fallback_{hash(url)}.png")
result.update({
"name": page.title(),
"price": None,
"strategy": "screenshot-fallback",
})
return result
except Exception as e:
result["error"] = str(e)
result["strategy"] = "failed"
return result
finally:
context.close()
Timeout Configuration
Configure timeouts at different levels for fine-grained control:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
# Context-level default timeout (applies to all actions)
context = browser.new_context()
context.set_default_timeout(10000) # 10s for actions
context.set_default_navigation_timeout(30000) # 30s for navigation
page = context.new_page()
# Page-level timeout override
page.set_default_timeout(5000)
# Per-action timeout (highest priority)
page.goto("https://example.com", timeout=60000)
page.locator("#slow-widget").wait_for(state="visible", timeout=20000)
context.close()
browser.close()
Timeout priority from highest to lowest: per-action > page-level > context-level > default (30 seconds).
Comprehensive Error-Handling Agent
Putting it all together in a production-ready agent:
import logging
import time
from dataclasses import dataclass
from playwright.sync_api import (
sync_playwright,
TimeoutError as PlaywrightTimeout,
Error as PlaywrightError,
)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("browser_agent")
@dataclass
class AgentResult:
url: str
success: bool
data: dict | None = None
error: str | None = None
attempts: int = 0
class RobustBrowserAgent:
def __init__(self, max_retries: int = 3, timeout: int = 15000):
self.max_retries = max_retries
self.timeout = timeout
def execute(self, url: str, task_fn) -> AgentResult:
with sync_playwright() as p:
browser = p.chromium.launch()
for attempt in range(1, self.max_retries + 1):
context = browser.new_context()
context.set_default_timeout(self.timeout)
page = context.new_page()
try:
logger.info(
f"Attempt {attempt}/{self.max_retries}: {url}"
)
page.goto(url, wait_until="networkidle")
data = task_fn(page)
return AgentResult(
url=url, success=True,
data=data, attempts=attempt,
)
except PlaywrightTimeout as e:
logger.warning(f"Timeout on attempt {attempt}: {e}")
page.screenshot(
path=f"timeout_attempt_{attempt}.png"
)
except PlaywrightError as e:
error_msg = str(e)
if "net::ERR_" in error_msg:
logger.error(f"Network error: {error_msg}")
else:
logger.error(f"Browser error: {error_msg}")
except Exception as e:
logger.error(f"Unexpected error: {e}")
finally:
context.close()
if attempt < self.max_retries:
delay = 2 ** attempt
logger.info(f"Waiting {delay}s before retry...")
time.sleep(delay)
browser.close()
return AgentResult(
url=url, success=False,
error="Max retries exceeded",
attempts=self.max_retries,
)
# Usage
agent = RobustBrowserAgent(max_retries=3, timeout=10000)
def scrape_task(page):
return {
"title": page.title(),
"heading": page.locator("h1").text_content(),
}
result = agent.execute("https://example.com", scrape_task)
if result.success:
print(f"Success after {result.attempts} attempt(s): {result.data}")
else:
print(f"Failed: {result.error}")
FAQ
How should I handle CAPTCHAs in my AI agent?
CAPTCHAs are specifically designed to block automation. Options include: using CAPTCHA-solving services (like 2Captcha or Anti-Captcha), switching to an official API if the site provides one, or escalating to a human operator. Some CAPTCHAs can be avoided by using residential proxies, maintaining realistic browsing patterns, and keeping session cookies. Never attempt to bypass CAPTCHAs on sites where you do not have permission to automate.
What is the right retry count for production agents?
Three retries with exponential backoff (2s, 4s, 8s) works well for most scenarios. For critical tasks, increase to 5 retries. For bulk scraping where individual failures are acceptable, use 2 retries to optimize throughput. Always set a circuit breaker — if more than 50 percent of requests fail in a window, pause the agent and alert an operator rather than continuing to hammer a broken or blocking site.
How do I distinguish between transient and permanent failures?
Network errors (net::ERR_CONNECTION_TIMED_OUT, net::ERR_CONNECTION_RESET) are typically transient and worth retrying. DNS failures (net::ERR_NAME_NOT_RESOLVED) are usually permanent. HTTP 404 and 410 responses are permanent. HTTP 429 (rate limited) and 503 (service unavailable) are transient. Element-not-found errors may be permanent if the page structure changed, or transient if the page had not finished loading. Log the specific error type and use it to decide whether to retry.
#ErrorHandling #RetryPatterns #Playwright #Resilience #AIAgents #BrowserAutomation #FaultTolerance
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.