Claude Computer Use for Form Automation: Auto-Filling Complex Multi-Step Forms
Build a Claude-powered form automation agent that detects fields, maps data intelligently, handles validation errors, and navigates multi-step form wizards — all through visual understanding instead of DOM selectors.
The Form Automation Challenge
Automating form filling sounds simple until you encounter real-world forms. Government applications with 50+ fields across 10 pages. Insurance claim forms with conditional sections that appear based on previous answers. Healthcare intake forms with dropdown menus that load dynamically. CRM data entry screens with custom field types.
Traditional automation with Playwright or Selenium handles forms by targeting specific selectors — page.fill("#firstName", "John"). This works until the form changes its field IDs, switches from a text input to a dropdown, or adds a new required field. Claude Computer Use takes a fundamentally different approach: it looks at the form, reads the labels, and fills in the appropriate values.
Form Field Detection and Mapping
The first step is to have Claude analyze the form and create a mapping between your data and the visible fields:
import anthropic
import json
client = anthropic.Anthropic()
def analyze_form(screenshot_b64: str) -> list[dict]:
"""Detect all form fields visible in the screenshot."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot_b64,
}},
{"type": "text", "text": """Analyze this form and list every input field visible.
For each field, return:
- label: the field's label text
- field_type: text, dropdown, checkbox, radio, date, textarea, file_upload
- required: true if marked as required (asterisk or "required" label)
- current_value: any pre-filled value, or null
- options: for dropdowns/radios, list the visible options if any
- approximate_position: {x, y} coordinates of the center of the input
Return as a JSON array."""},
],
}],
)
return json.loads(response.content[0].text)
The Form-Filling Agent
With field detection in place, we build an agent that maps your data to detected fields and fills them in sequence:
class FormFillingAgent:
def __init__(self, browser_manager):
self.browser = browser_manager
self.client = anthropic.Anthropic()
async def fill_form(self, form_data: dict, context: str = ""):
"""Fill a form using Claude vision to identify and interact with fields."""
screenshot_b64 = await self.browser.screenshot()
# Step 1: Create a filling plan
plan = self._create_plan(screenshot_b64, form_data, context)
# Step 2: Execute each field fill
for field in plan:
await self._fill_field(field)
# Brief pause for UI updates
import asyncio
await asyncio.sleep(0.5)
# Step 3: Verify filled values
verification = await self._verify_form(form_data)
return verification
def _create_plan(self, screenshot_b64: str, form_data: dict, context: str) -> list:
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot_b64,
}},
{"type": "text", "text": f"""I need to fill this form with the following data:
{json.dumps(form_data, indent=2)}
Context: {context}
Create a step-by-step plan to fill each field. For each step:
- field_label: which field to fill
- data_key: which key from my data maps to this field
- action: click, type, select_dropdown, check_checkbox, select_radio
- coordinate: approximate {{x, y}} of the input element
- value: the value to enter
Order the steps top-to-bottom, left-to-right as fields appear on screen.
Return as a JSON array."""},
],
}],
)
return json.loads(response.content[0].text)
async def _fill_field(self, field: dict):
"""Fill a single field based on the plan."""
x = field["coordinate"]["x"]
y = field["coordinate"]["y"]
action = field["action"]
value = str(field["value"])
if action == "type":
await self.browser.click(x, y)
import asyncio
await asyncio.sleep(0.3)
# Clear existing content
await self.browser.press_key("Control+a")
await self.browser.type_text(value)
elif action == "select_dropdown":
await self.browser.click(x, y)
import asyncio
await asyncio.sleep(0.5)
# Use Claude to find and click the right option
await self._select_option_visually(value)
elif action == "check_checkbox":
await self.browser.click(x, y)
elif action == "select_radio":
await self.browser.click(x, y)
Handling Dropdown Menus
Dropdowns are notoriously difficult for visual automation because clicking them reveals a new set of options that must be located and clicked. Here is a robust approach:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
async def _select_option_visually(self, target_value: str):
"""After opening a dropdown, find and click the target option."""
import asyncio
await asyncio.sleep(0.5) # Wait for dropdown to open
screenshot_b64 = await self.browser.screenshot()
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot_b64,
}},
{"type": "text", "text": f"""A dropdown menu is open on screen.
Find the option that best matches: "{target_value}"
Return the exact coordinate to click as JSON: {{"x": number, "y": number}}
If the option is not visible, return {{"scroll": "down"}} to indicate
I need to scroll within the dropdown."""},
],
}],
)
result = json.loads(response.content[0].text)
if "scroll" in result:
await self.browser.scroll(result.get("x", 640), result.get("y", 400), "down")
await self._select_option_visually(target_value) # Retry
else:
await self.browser.click(result["x"], result["y"])
Multi-Step Form Navigation
Many forms span multiple pages. The agent needs to handle "Next" buttons, progress indicators, and conditional sections:
async def fill_multi_step_form(self, all_data: dict, max_pages: int = 10):
"""Fill a multi-page form wizard."""
for page_num in range(max_pages):
screenshot_b64 = await self.browser.screenshot()
# Analyze current page
page_info = self._analyze_page(screenshot_b64)
if page_info.get("is_confirmation_page"):
return {"status": "complete", "page": page_num + 1}
# Fill visible fields on this page
await self.fill_form(all_data, context=f"Page {page_num + 1} of the form")
# Check for validation errors before proceeding
validation = await self._check_validation(screenshot_b64)
if validation.get("has_errors"):
await self._fix_validation_errors(validation["errors"])
# Click Next/Continue button
await self._click_next_button()
import asyncio
await asyncio.sleep(1)
return {"status": "max_pages_reached"}
Validation Error Handling
After filling fields and before clicking "Next," the agent should check for validation errors:
async def _check_validation(self, screenshot_b64: str = None) -> dict:
if not screenshot_b64:
screenshot_b64 = await self.browser.screenshot()
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot_b64,
}},
{"type": "text", "text": """Check this form for validation errors.
Look for: red borders, error messages, warning icons, tooltips.
Return JSON: {"has_errors": bool, "errors": [{"field": str, "message": str}]}"""},
],
}],
)
return json.loads(response.content[0].text)
FAQ
How does Claude handle date picker widgets?
Claude can interact with date pickers visually — clicking the calendar icon, navigating months, and selecting dates. For complex date pickers, it often works better to click the text input first, clear it, and type the date in the expected format (MM/DD/YYYY, etc.) rather than navigating the calendar widget.
Can Claude handle file upload fields?
Claude can identify file upload fields and click the "Choose File" button, but it cannot interact with the operating system's file dialog. For file uploads, use a hybrid approach: let Claude identify the upload field, then use Playwright's set_input_files() method to attach the file programmatically.
What about CAPTCHA or anti-automation fields on forms?
Claude can visually interpret some CAPTCHA types, but bypassing them is restricted by most websites' terms of service and Anthropic's usage policies. For legitimate automation of your own forms, disable CAPTCHA in development/staging environments or use authenticated sessions that skip the challenge.
#FormAutomation #ClaudeComputerUse #RPA #DataEntry #BrowserAutomation #AIFormFilling #AutomatedForms
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.