Skip to content
Learn Agentic AI12 min read0 views

Claude Computer Use for Form Automation: Auto-Filling Complex Multi-Step Forms

Build a Claude-powered form automation agent that detects fields, maps data intelligently, handles validation errors, and navigates multi-step form wizards — all through visual understanding instead of DOM selectors.

The Form Automation Challenge

Automating form filling sounds simple until you encounter real-world forms. Government applications with 50+ fields across 10 pages. Insurance claim forms with conditional sections that appear based on previous answers. Healthcare intake forms with dropdown menus that load dynamically. CRM data entry screens with custom field types.

Traditional automation with Playwright or Selenium handles forms by targeting specific selectors — page.fill("#firstName", "John"). This works until the form changes its field IDs, switches from a text input to a dropdown, or adds a new required field. Claude Computer Use takes a fundamentally different approach: it looks at the form, reads the labels, and fills in the appropriate values.

Form Field Detection and Mapping

The first step is to have Claude analyze the form and create a mapping between your data and the visible fields:

import anthropic
import json

client = anthropic.Anthropic()

def analyze_form(screenshot_b64: str) -> list[dict]:
    """Detect all form fields visible in the screenshot."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": screenshot_b64,
                }},
                {"type": "text", "text": """Analyze this form and list every input field visible.

For each field, return:
- label: the field's label text
- field_type: text, dropdown, checkbox, radio, date, textarea, file_upload
- required: true if marked as required (asterisk or "required" label)
- current_value: any pre-filled value, or null
- options: for dropdowns/radios, list the visible options if any
- approximate_position: {x, y} coordinates of the center of the input

Return as a JSON array."""},
            ],
        }],
    )
    return json.loads(response.content[0].text)

The Form-Filling Agent

With field detection in place, we build an agent that maps your data to detected fields and fills them in sequence:

class FormFillingAgent:
    def __init__(self, browser_manager):
        self.browser = browser_manager
        self.client = anthropic.Anthropic()

    async def fill_form(self, form_data: dict, context: str = ""):
        """Fill a form using Claude vision to identify and interact with fields."""
        screenshot_b64 = await self.browser.screenshot()

        # Step 1: Create a filling plan
        plan = self._create_plan(screenshot_b64, form_data, context)

        # Step 2: Execute each field fill
        for field in plan:
            await self._fill_field(field)
            # Brief pause for UI updates
            import asyncio
            await asyncio.sleep(0.5)

        # Step 3: Verify filled values
        verification = await self._verify_form(form_data)
        return verification

    def _create_plan(self, screenshot_b64: str, form_data: dict, context: str) -> list:
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": f"""I need to fill this form with the following data:

{json.dumps(form_data, indent=2)}

Context: {context}

Create a step-by-step plan to fill each field. For each step:
- field_label: which field to fill
- data_key: which key from my data maps to this field
- action: click, type, select_dropdown, check_checkbox, select_radio
- coordinate: approximate {{x, y}} of the input element
- value: the value to enter

Order the steps top-to-bottom, left-to-right as fields appear on screen.
Return as a JSON array."""},
                ],
            }],
        )
        return json.loads(response.content[0].text)

    async def _fill_field(self, field: dict):
        """Fill a single field based on the plan."""
        x = field["coordinate"]["x"]
        y = field["coordinate"]["y"]
        action = field["action"]
        value = str(field["value"])

        if action == "type":
            await self.browser.click(x, y)
            import asyncio
            await asyncio.sleep(0.3)
            # Clear existing content
            await self.browser.press_key("Control+a")
            await self.browser.type_text(value)

        elif action == "select_dropdown":
            await self.browser.click(x, y)
            import asyncio
            await asyncio.sleep(0.5)
            # Use Claude to find and click the right option
            await self._select_option_visually(value)

        elif action == "check_checkbox":
            await self.browser.click(x, y)

        elif action == "select_radio":
            await self.browser.click(x, y)

Handling Dropdown Menus

Dropdowns are notoriously difficult for visual automation because clicking them reveals a new set of options that must be located and clicked. Here is a robust approach:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

    async def _select_option_visually(self, target_value: str):
        """After opening a dropdown, find and click the target option."""
        import asyncio
        await asyncio.sleep(0.5)  # Wait for dropdown to open
        screenshot_b64 = await self.browser.screenshot()

        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": f"""A dropdown menu is open on screen.
Find the option that best matches: "{target_value}"
Return the exact coordinate to click as JSON: {{"x": number, "y": number}}
If the option is not visible, return {{"scroll": "down"}} to indicate
I need to scroll within the dropdown."""},
                ],
            }],
        )

        result = json.loads(response.content[0].text)
        if "scroll" in result:
            await self.browser.scroll(result.get("x", 640), result.get("y", 400), "down")
            await self._select_option_visually(target_value)  # Retry
        else:
            await self.browser.click(result["x"], result["y"])

Multi-Step Form Navigation

Many forms span multiple pages. The agent needs to handle "Next" buttons, progress indicators, and conditional sections:

    async def fill_multi_step_form(self, all_data: dict, max_pages: int = 10):
        """Fill a multi-page form wizard."""
        for page_num in range(max_pages):
            screenshot_b64 = await self.browser.screenshot()

            # Analyze current page
            page_info = self._analyze_page(screenshot_b64)

            if page_info.get("is_confirmation_page"):
                return {"status": "complete", "page": page_num + 1}

            # Fill visible fields on this page
            await self.fill_form(all_data, context=f"Page {page_num + 1} of the form")

            # Check for validation errors before proceeding
            validation = await self._check_validation(screenshot_b64)
            if validation.get("has_errors"):
                await self._fix_validation_errors(validation["errors"])

            # Click Next/Continue button
            await self._click_next_button()
            import asyncio
            await asyncio.sleep(1)

        return {"status": "max_pages_reached"}

Validation Error Handling

After filling fields and before clicking "Next," the agent should check for validation errors:

    async def _check_validation(self, screenshot_b64: str = None) -> dict:
        if not screenshot_b64:
            screenshot_b64 = await self.browser.screenshot()

        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": """Check this form for validation errors.
Look for: red borders, error messages, warning icons, tooltips.
Return JSON: {"has_errors": bool, "errors": [{"field": str, "message": str}]}"""},
                ],
            }],
        )
        return json.loads(response.content[0].text)

FAQ

How does Claude handle date picker widgets?

Claude can interact with date pickers visually — clicking the calendar icon, navigating months, and selecting dates. For complex date pickers, it often works better to click the text input first, clear it, and type the date in the expected format (MM/DD/YYYY, etc.) rather than navigating the calendar widget.

Can Claude handle file upload fields?

Claude can identify file upload fields and click the "Choose File" button, but it cannot interact with the operating system's file dialog. For file uploads, use a hybrid approach: let Claude identify the upload field, then use Playwright's set_input_files() method to attach the file programmatically.

What about CAPTCHA or anti-automation fields on forms?

Claude can visually interpret some CAPTCHA types, but bypassing them is restricted by most websites' terms of service and Anthropic's usage policies. For legitimate automation of your own forms, disable CAPTCHA in development/staging environments or use authenticated sessions that skip the challenge.


#FormAutomation #ClaudeComputerUse #RPA #DataEntry #BrowserAutomation #AIFormFilling #AutomatedForms

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.