Skip to content
Learn Agentic AI11 min read0 views

Error Recovery in Claude Computer Use: Handling Unexpected Dialogs and Page Changes

Build resilient Claude Computer Use agents that detect and recover from unexpected dialogs, error popups, page navigation failures, stale states, and timeout conditions using structured recovery strategies.

Why Error Recovery Matters

A Claude Computer Use agent that only works when everything goes perfectly is useless in production. Real browser sessions are full of surprises: cookie consent banners blocking the target element, session timeout dialogs appearing mid-workflow, unexpected redirects to login pages, network error alerts, and browser-level permission prompts. Without error recovery, any of these will derail the agent.

Building robust error recovery transforms a fragile demo into a production-grade automation system. The key insight is that Claude's vision capability is itself the best error detection mechanism — if Claude can see that something went wrong, it can reason about how to fix it.

Detecting Error States

The first component is a screen state classifier that determines whether the current screen shows an expected state, an error state, or an interruption:

import anthropic
import json
from enum import Enum

client = anthropic.Anthropic()

class ScreenState(str, Enum):
    EXPECTED = "expected"
    ERROR_DIALOG = "error_dialog"
    COOKIE_BANNER = "cookie_banner"
    LOGIN_REDIRECT = "login_redirect"
    PERMISSION_PROMPT = "permission_prompt"
    LOADING = "loading"
    NETWORK_ERROR = "network_error"
    UNKNOWN_INTERRUPTION = "unknown_interruption"

def classify_screen(screenshot_b64: str, expected_state: str) -> dict:
    """Classify the current screen state relative to expected state."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": screenshot_b64,
                }},
                {"type": "text", "text": f"""Classify the current screen state.

Expected state: {expected_state}

Determine if the screen shows:
- "expected": The expected content is visible
- "error_dialog": An error message or dialog is overlaying the content
- "cookie_banner": A cookie consent banner is blocking interaction
- "login_redirect": The page has redirected to a login/authentication page
- "permission_prompt": A browser permission prompt is showing
- "loading": The page is still loading (spinner, skeleton, progress bar)
- "network_error": A network/connection error message is displayed
- "unknown_interruption": Something unexpected is blocking the task

Return JSON:
{{"state": "<classification>", "description": "<what you see>", "blocking_element": "<what is blocking, if any>"}}"""},
            ],
        }],
    )
    return json.loads(response.content[0].text)

Recovery Strategy Framework

Each error type needs a specific recovery strategy. Here is a structured approach:

class RecoveryManager:
    def __init__(self, browser, claude_client):
        self.browser = browser
        self.client = claude_client
        self.max_retries = 3

    async def recover(self, screen_state: dict) -> bool:
        """Attempt to recover from a detected error state."""
        state = screen_state["state"]

        recovery_handlers = {
            ScreenState.COOKIE_BANNER: self._dismiss_cookie_banner,
            ScreenState.ERROR_DIALOG: self._handle_error_dialog,
            ScreenState.LOGIN_REDIRECT: self._handle_login_redirect,
            ScreenState.PERMISSION_PROMPT: self._handle_permission_prompt,
            ScreenState.LOADING: self._wait_for_load,
            ScreenState.NETWORK_ERROR: self._handle_network_error,
            ScreenState.UNKNOWN_INTERRUPTION: self._handle_unknown,
        }

        handler = recovery_handlers.get(state)
        if handler:
            return await handler(screen_state)
        return False

    async def _dismiss_cookie_banner(self, state: dict) -> bool:
        """Find and click the accept/dismiss button on cookie banners."""
        screenshot_b64 = await self.browser.screenshot()
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=512,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": """There is a cookie consent banner on screen.
Find the "Accept" or "Accept All" or dismiss button.
Return its coordinates as JSON: {"x": number, "y": number}
If there is a "Reject" or "Necessary Only" option and no accept, use that."""},
                ],
            }],
        )
        coords = json.loads(response.content[0].text)
        await self.browser.click(coords["x"], coords["y"])
        import asyncio
        await asyncio.sleep(1)
        return True

    async def _handle_error_dialog(self, state: dict) -> bool:
        """Dismiss error dialogs and determine next action."""
        screenshot_b64 = await self.browser.screenshot()
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": """An error dialog is showing on screen.
Read the error message and determine:
1. What the error says
2. Whether clicking OK/Close/Dismiss will allow continuing
3. The coordinates of the dismiss button

Return JSON: {"error_message": str, "recoverable": bool, "dismiss_coords": {"x": int, "y": int}}"""},
                ],
            }],
        )
        result = json.loads(response.content[0].text)

        if result["recoverable"] and result.get("dismiss_coords"):
            await self.browser.click(
                result["dismiss_coords"]["x"],
                result["dismiss_coords"]["y"]
            )
            return True
        return False

    async def _wait_for_load(self, state: dict) -> bool:
        """Wait for page to finish loading."""
        import asyncio
        for _ in range(10):
            await asyncio.sleep(2)
            screenshot_b64 = await self.browser.screenshot()
            check = classify_screen(screenshot_b64, "page fully loaded")
            if check["state"] != ScreenState.LOADING:
                return True
        return False

    async def _handle_network_error(self, state: dict) -> bool:
        """Refresh the page on network errors."""
        await self.browser.press_key("F5")
        import asyncio
        await asyncio.sleep(3)
        return True

Integrating Recovery into the Agent Loop

The recovery manager wraps around the main agent loop to intercept and handle error states before they derail the workflow:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

class ResilientBrowserAgent:
    def __init__(self, browser, max_steps=30):
        self.browser = browser
        self.client = anthropic.Anthropic()
        self.recovery = RecoveryManager(browser, self.client)
        self.max_steps = max_steps

    async def run(self, task: str, expected_states: list[str] = None):
        messages = [{"role": "user", "content": task}]
        recovery_attempts = 0
        max_recovery = 5

        for step in range(self.max_steps):
            screenshot_b64 = await self.browser.screenshot()

            # Check for error states before proceeding
            expected = (expected_states[min(step, len(expected_states)-1)]
                       if expected_states
                       else "normal application state")

            screen = classify_screen(screenshot_b64, expected)

            if screen["state"] != ScreenState.EXPECTED:
                if recovery_attempts >= max_recovery:
                    return {"status": "failed", "reason": "max recovery attempts"}

                recovered = await self.recovery.recover(screen)
                recovery_attempts += 1

                if recovered:
                    continue  # Retry the step with a fresh screenshot
                else:
                    return {
                        "status": "failed",
                        "reason": f"Unrecoverable: {screen['description']}"
                    }

            # Normal agent loop continues here
            # ... send screenshot to Claude, execute actions ...

Retry with Context

When an action fails, retrying with context about what went wrong leads to better outcomes than a blind retry. Include the error state in the next prompt:

retry_message = f"""The previous action resulted in an error: {screen['description']}
I have dismissed the error dialog. Please try a different approach
to accomplish the same goal. The current screen state is shown above."""

This gives Claude the context to choose an alternative strategy rather than repeating the same action that caused the failure.

FAQ

How many recovery attempts should I allow before giving up?

Set a global recovery limit (5-10 total recoveries per task) and a per-error limit (2-3 retries for the same error). If the same error occurs three times in a row, the issue is likely systemic — such as expired credentials or a down service — and should be escalated rather than retried.

Should I log the error screenshots for debugging?

Absolutely. Save every screenshot that triggers a recovery attempt along with the classification result and the recovery action taken. This creates an audit trail that is invaluable for improving your recovery strategies over time. Store them with timestamps and step numbers so you can reconstruct the full session.

How do I handle two-factor authentication prompts?

Two-factor authentication requires human-in-the-loop handling. When the agent detects a 2FA prompt, pause execution, notify the user, and wait for them to complete the authentication step. Resume the automation after the user confirms the 2FA is complete.


#ErrorRecovery #ClaudeComputerUse #ResilientAgents #BrowserAutomation #FaultTolerance #AIAgentDesign #AutomationReliability

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.