UFO Action Types: Click, Type, Scroll, and Application-Specific Controls
Comprehensive guide to every action type UFO can perform — from basic clicks and keyboard input to scroll operations, UIA element interactions, and application-specific control manipulation.
The Action Space
Every step UFO takes involves selecting and executing an action from a defined set. Understanding these actions is essential for debugging UFO behavior, extending its capabilities, and knowing what tasks it can and cannot handle.
UFO's action space is divided into universal actions that work across all applications and application-specific actions that leverage unique control types in particular apps.
Universal Actions
Click Actions
The most fundamental action. UFO identifies a numbered UI element from its annotated screenshot and clicks it:
# UFO action representation for click
action = {
"action_type": "click",
"control_label": 7, # The numbered label on the annotated screenshot
"control_text": "Save", # Human-readable description
"parameters": {
"button": "left", # left, right, or middle
"double_click": False, # True for double-click
}
}
# Under the hood, UFO translates this to pywinauto calls
def execute_click(control, params):
"""Execute a click action on a UIA control."""
element = find_control_by_label(control["control_label"])
if params.get("double_click"):
element.double_click_input()
elif params.get("button") == "right":
element.click_input(button="right")
else:
element.click_input()
UFO supports left-click, right-click, and double-click. Right-click is used for context menus, and double-click for opening files or editing cells.
Type / Input Text
After clicking on a text field or editor, UFO types text into it:
action = {
"action_type": "set_text",
"control_label": 12,
"parameters": {
"text": "Quarterly Sales Report - Q1 2026",
"clear_first": True, # Clear existing text before typing
}
}
def execute_set_text(control, params):
"""Type text into a control."""
element = find_control_by_label(control["control_label"])
if params.get("clear_first"):
element.set_edit_text("")
element.type_keys(params["text"], with_spaces=True)
The set_text action uses the UIA ValuePattern when available (faster, more reliable) and falls back to keyboard simulation when the control does not support direct value setting.
Keyboard Shortcuts
Many Windows tasks are faster with keyboard shortcuts than mouse clicks:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
action = {
"action_type": "keyboard",
"parameters": {
"keys": "{Ctrl}s", # pywinauto key format
"description": "Save the current document"
}
}
# Common keyboard patterns UFO uses
COMMON_SHORTCUTS = {
"save": "{Ctrl}s",
"copy": "{Ctrl}c",
"paste": "{Ctrl}v",
"undo": "{Ctrl}z",
"select_all": "{Ctrl}a",
"find": "{Ctrl}f",
"new": "{Ctrl}n",
"close_tab": "{Ctrl}w",
"switch_app": "{Alt}{Tab}",
}
def execute_keyboard(params):
"""Send keyboard shortcuts to the active window."""
from pywinauto.keyboard import send_keys
send_keys(params["keys"])
Scroll Actions
For content that extends beyond the visible area:
action = {
"action_type": "scroll",
"control_label": 3,
"parameters": {
"direction": "down", # up, down, left, right
"amount": 5, # Number of scroll units
}
}
def execute_scroll(control, params):
"""Scroll within a control."""
element = find_control_by_label(control["control_label"])
direction = params["direction"]
amount = params["amount"]
if direction == "down":
element.scroll("down", "page", amount)
elif direction == "up":
element.scroll("up", "page", amount)
Application-Specific Control Types
Windows applications expose different control types through the UI Automation framework. UFO recognizes and interacts with all standard UIA control types:
# UIA Control Types that UFO can interact with
UIA_CONTROL_TYPES = {
"Button": "click", # Standard buttons
"CheckBox": "toggle", # Check/uncheck
"ComboBox": "select", # Dropdown selection
"DataGrid": "cell_select", # Table/grid navigation
"Edit": "set_text", # Text input fields
"Hyperlink": "click", # Clickable links
"ListItem": "click", # Items in a list
"Menu": "click", # Menu items
"MenuItem": "click", # Sub-menu items
"RadioButton": "select", # Radio button selection
"Slider": "set_value", # Slider controls
"Spinner": "set_value", # Numeric up/down
"Tab": "click", # Tab switching
"Text": "read", # Static text (read-only)
"Tree": "expand_collapse", # Tree view navigation
"TreeItem": "click", # Tree node selection
}
Excel-Specific Actions
Excel cells support unique patterns like range selection and formula entry:
# Excel cell interaction
excel_actions = {
"action_type": "excel_cell",
"parameters": {
"cell": "B5",
"value": "=SUM(B2:B4)",
"action": "set_formula"
}
}
# When UFO detects Excel, it can use COM automation
def excel_set_cell(cell_ref: str, value: str):
"""Set an Excel cell value using the UIA pattern."""
# UFO navigates to the Name Box, types the cell reference,
# presses Enter to navigate, then types the value
steps = [
{"action": "click", "target": "Name Box"},
{"action": "set_text", "text": cell_ref},
{"action": "keyboard", "keys": "{Enter}"},
{"action": "set_text", "text": value},
{"action": "keyboard", "keys": "{Enter}"},
]
return steps
Outlook-Specific Actions
Email composition involves interacting with rich text editors and address fields:
# Composing an email through UFO actions
outlook_compose_steps = [
{"action": "click", "target": "New Email"},
{"action": "click", "target": "To field"},
{"action": "set_text", "text": "finance@company.com"},
{"action": "keyboard", "keys": "{Tab}"}, # Move to CC
{"action": "keyboard", "keys": "{Tab}"}, # Move to Subject
{"action": "set_text", "text": "Q1 Sales Report"},
{"action": "keyboard", "keys": "{Tab}"}, # Move to body
{"action": "set_text", "text": "Please find the Q1 numbers attached."},
{"action": "click", "target": "Send"},
]
The Action Selection Prompt
UFO sends the vision model a structured prompt that includes the available actions. The model must choose from this constrained set:
ACTION_PROMPT = """You are a Windows UI automation agent. Based on the
annotated screenshot, select the next action.
Available actions:
- click(label): Click on the UI element with the given label number
- set_text(label, text): Type text into the labeled control
- keyboard(keys): Send keyboard shortcut
- scroll(label, direction, amount): Scroll within a control
- finish(status): Mark task as complete or failed
Respond in JSON format:
{
"thought": "What I observe and why I chose this action",
"action_type": "click|set_text|keyboard|scroll|finish",
"control_label": 5,
"parameters": {}
}"""
FAQ
Can UFO interact with custom-drawn controls that are not standard UIA elements?
Custom-drawn controls without UIA support are UFO's biggest challenge. In these cases, UFO falls back to coordinate-based clicking using the vision model's understanding of the screenshot. This is less reliable but often works for simple buttons and text areas rendered without standard controls.
How does UFO handle pop-up dialogs and confirmation boxes?
UFO's observation-action loop naturally handles unexpected dialogs. When a dialog appears, the next screenshot capture will show it, and the vision model will recognize it as a dialog requiring interaction (clicking OK, Cancel, or filling in fields) before continuing with the main task.
#UFOActions #UIAutomation #WindowsControls #ClickAutomation #KeyboardShortcuts #DesktopAI #PythonAutomation #pywinauto
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.