Multi-Monitor and Resolution Handling in Claude Computer Use
Master the technical details of resolution management, coordinate scaling, multi-monitor awareness, and viewport configuration for reliable Claude Computer Use agents across different display setups.
Why Resolution Matters
The coordinate system in Claude Computer Use is absolute pixel-based. When Claude says "click at (640, 400)," your execution layer moves the mouse to pixel position (640, 400). If there is any mismatch between the resolution Claude thinks it is working with and the actual screen resolution, clicks will land in the wrong place.
This becomes complex in real-world environments: high-DPI displays with scaling, multi-monitor setups with different resolutions, virtual displays with configurable dimensions, and browser viewports that differ from screen resolution. Getting this right is the difference between an agent that works reliably and one that clicks random spots on screen.
Resolution Detection and Configuration
The first step is accurately detecting the display configuration and configuring the Claude tool accordingly:
import subprocess
import re
def get_display_info(display: str = ":0") -> dict:
"""Detect display resolution and layout using xrandr."""
result = subprocess.run(
["xrandr", "--query"],
capture_output=True,
text=True,
env={"DISPLAY": display},
)
monitors = []
current_monitor = None
for line in result.stdout.split("\n"):
# Match connected monitor lines
connected_match = re.match(
r"(\S+) connected (?:primary )?(\d+)x(\d+)\+(\d+)\+(\d+)", line
)
if connected_match:
current_monitor = {
"name": connected_match.group(1),
"width": int(connected_match.group(2)),
"height": int(connected_match.group(3)),
"offset_x": int(connected_match.group(4)),
"offset_y": int(connected_match.group(5)),
}
monitors.append(current_monitor)
# Calculate total virtual screen dimensions
total_width = max(m["offset_x"] + m["width"] for m in monitors)
total_height = max(m["offset_y"] + m["height"] for m in monitors)
return {
"monitors": monitors,
"total_width": total_width,
"total_height": total_height,
}
Configuring Claude for a Specific Monitor
When automating on a multi-monitor setup, you typically want Claude to focus on a single monitor. Capture screenshots from only that monitor and configure the tool dimensions accordingly:
import anthropic
from PIL import Image
import io
import base64
class MonitorAwareController:
def __init__(self, target_monitor: int = 0):
self.display_info = get_display_info()
self.monitor = self.display_info["monitors"][target_monitor]
self.client = anthropic.Anthropic()
def screenshot(self) -> str:
"""Capture only the target monitor's region."""
# Capture full screen
subprocess.run(["scrot", "/tmp/full_screen.png"], check=True)
# Crop to target monitor
img = Image.open("/tmp/full_screen.png")
region = img.crop((
self.monitor["offset_x"],
self.monitor["offset_y"],
self.monitor["offset_x"] + self.monitor["width"],
self.monitor["offset_y"] + self.monitor["height"],
))
buffer = io.BytesIO()
region.save(buffer, format="PNG")
return base64.standard_b64encode(buffer.getvalue()).decode()
def get_tool_config(self) -> dict:
"""Return the computer use tool configured for the target monitor."""
return {
"type": "computer_20241022",
"name": "computer",
"display_width_px": self.monitor["width"],
"display_height_px": self.monitor["height"],
"display_number": 0,
}
def translate_coordinates(self, x: int, y: int) -> tuple[int, int]:
"""Convert monitor-relative coordinates to absolute screen coordinates."""
abs_x = x + self.monitor["offset_x"]
abs_y = y + self.monitor["offset_y"]
return abs_x, abs_y
def click(self, x: int, y: int):
"""Click using monitor-relative coordinates from Claude."""
abs_x, abs_y = self.translate_coordinates(x, y)
subprocess.run(["xdotool", "mousemove", str(abs_x), str(abs_y)])
subprocess.run(["xdotool", "click", "1"])
Coordinate Scaling for High-DPI Displays
High-DPI (Retina) displays add another layer of complexity. A 2x DPI display with 2560x1600 physical pixels might report as 1280x800 logical pixels. If your screenshot captures at physical resolution but Claude's tool is configured at logical resolution, coordinates will be off by 2x:
class DPIAwareController:
def __init__(self, logical_width: int, logical_height: int, scale_factor: float = 1.0):
self.logical_width = logical_width
self.logical_height = logical_height
self.scale_factor = scale_factor
self.physical_width = int(logical_width * scale_factor)
self.physical_height = int(logical_height * scale_factor)
def screenshot(self) -> str:
"""Capture and resize screenshot to match logical dimensions."""
subprocess.run(["scrot", "/tmp/screen.png"], check=True)
img = Image.open("/tmp/screen.png")
# Resize to logical dimensions so Claude's coordinates match
if img.size != (self.logical_width, self.logical_height):
img = img.resize(
(self.logical_width, self.logical_height),
Image.LANCZOS,
)
buffer = io.BytesIO()
img.save(buffer, format="PNG")
return base64.standard_b64encode(buffer.getvalue()).decode()
def click(self, x: int, y: int):
"""Convert Claude's logical coordinates to physical coordinates."""
physical_x = int(x * self.scale_factor)
physical_y = int(y * self.scale_factor)
subprocess.run(["xdotool", "mousemove", str(physical_x), str(physical_y)])
subprocess.run(["xdotool", "click", "1"])
def get_tool_config(self) -> dict:
return {
"type": "computer_20241022",
"name": "computer",
"display_width_px": self.logical_width,
"display_height_px": self.logical_height,
"display_number": 0,
}
Optimal Resolution for Claude
Anthropic recommends specific resolutions for best performance. The supported resolutions and their token costs differ:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
RECOMMENDED_RESOLUTIONS = {
"small": {"width": 1024, "height": 768, "tokens_per_screenshot": 1170},
"medium": {"width": 1280, "height": 800, "tokens_per_screenshot": 1462},
"large": {"width": 1920, "height": 1080, "tokens_per_screenshot": 2765},
}
def select_optimal_resolution(task_type: str) -> dict:
"""Select the best resolution based on task requirements."""
if task_type in ("data_extraction", "reading"):
# Higher resolution for text clarity
return RECOMMENDED_RESOLUTIONS["large"]
elif task_type in ("navigation", "clicking"):
# Medium resolution balances clarity and cost
return RECOMMENDED_RESOLUTIONS["medium"]
elif task_type in ("simple_interaction",):
# Small resolution for cost efficiency
return RECOMMENDED_RESOLUTIONS["small"]
return RECOMMENDED_RESOLUTIONS["medium"]
Virtual Display Setup
For server-side automation, use Xvfb to create a virtual display with controlled dimensions:
def setup_virtual_display(width: int = 1280, height: int = 800, display_num: int = 99):
"""Start an Xvfb virtual display."""
subprocess.Popen([
"Xvfb",
f":{display_num}",
"-screen", "0", f"{width}x{height}x24",
"-ac",
])
import time
time.sleep(1)
# Set the DISPLAY environment variable
import os
os.environ["DISPLAY"] = f":{display_num}"
return {
"display": f":{display_num}",
"width": width,
"height": height,
}
Using a virtual display gives you full control over the resolution and eliminates variables like DPI scaling and multi-monitor configurations. This is the recommended approach for production deployments where consistency matters.
Viewport vs Screen Resolution
When automating browsers specifically, there is an important distinction between screen resolution and browser viewport size. The viewport is the area inside the browser chrome (excluding the title bar, address bar, and scrollbar). Configure Claude with the viewport dimensions, not the full screen dimensions, when capturing browser-only screenshots via Playwright:
async def configure_browser_viewport(page, target_width: int = 1280, target_height: int = 800):
"""Ensure browser viewport matches Claude's expected dimensions."""
await page.set_viewport_size({"width": target_width, "height": target_height})
# Verify actual viewport size
actual = await page.evaluate(
"() => ({width: window.innerWidth, height: window.innerHeight})"
)
assert actual["width"] == target_width, f"Viewport width mismatch: {actual['width']}"
assert actual["height"] == target_height, f"Viewport height mismatch: {actual['height']}"
FAQ
What happens if I send a screenshot at the wrong resolution?
Claude will still analyze the image and return coordinates, but those coordinates will be relative to the image dimensions, not the declared display dimensions. If you declare a 1280x800 display but send a 1920x1080 screenshot, Claude's coordinates will be based on the 1920x1080 image space, causing misalignment when you try to click on the 1280x800 display.
Should I use the highest resolution possible for better accuracy?
Not necessarily. Higher resolution increases token consumption (and cost) per screenshot. For most tasks, 1280x800 provides sufficient visual clarity. Use 1920x1080 only when you need to read small text or interact with densely packed interfaces.
How do I test my coordinate translation logic?
Create a simple test that clicks on known UI elements (buttons, links) at various positions across the screen. Verify each click lands correctly by checking the screen state after clicking. Automated visual regression tests can compare before/after screenshots to confirm correct interaction.
#MultiMonitor #ResolutionHandling #ClaudeComputerUse #DisplayConfig #ViewportManagement #CoordinateScaling #DesktopAutomation
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.