Multi-Monitor and Resolution Handling in Claude Computer Use

Why Resolution Matters

The coordinate system in Claude Computer Use is absolute pixel-based. When Claude says "click at (640, 400)," your execution layer moves the mouse to pixel position (640, 400). If there is any mismatch between the resolution Claude thinks it is working with and the actual screen resolution, clicks will land in the wrong place.

This becomes complex in real-world environments: high-DPI displays with scaling, multi-monitor setups with different resolutions, virtual displays with configurable dimensions, and browser viewports that differ from screen resolution. Getting this right is the difference between an agent that works reliably and one that clicks random spots on screen.

Resolution Detection and Configuration

The first step is accurately detecting the display configuration and configuring the Claude tool accordingly:

import subprocess
import re

def get_display_info(display: str = ":0") -> dict:
    """Detect display resolution and layout using xrandr."""
    result = subprocess.run(
        ["xrandr", "--query"],
        capture_output=True,
        text=True,
        env={"DISPLAY": display},
    )

    monitors = []
    current_monitor = None

    for line in result.stdout.split("\n"):
        # Match connected monitor lines
        connected_match = re.match(
            r"(\S+) connected (?:primary )?(\d+)x(\d+)\+(\d+)\+(\d+)", line
        )
        if connected_match:
            current_monitor = {
                "name": connected_match.group(1),
                "width": int(connected_match.group(2)),
                "height": int(connected_match.group(3)),
                "offset_x": int(connected_match.group(4)),
                "offset_y": int(connected_match.group(5)),
            }
            monitors.append(current_monitor)

    # Calculate total virtual screen dimensions
    total_width = max(m["offset_x"] + m["width"] for m in monitors)
    total_height = max(m["offset_y"] + m["height"] for m in monitors)

    return {
        "monitors": monitors,
        "total_width": total_width,
        "total_height": total_height,
    }

Configuring Claude for a Specific Monitor

When automating on a multi-monitor setup, you typically want Claude to focus on a single monitor. Capture screenshots from only that monitor and configure the tool dimensions accordingly:

import anthropic
from PIL import Image
import io
import base64

class MonitorAwareController:
    def __init__(self, target_monitor: int = 0):
        self.display_info = get_display_info()
        self.monitor = self.display_info["monitors"][target_monitor]
        self.client = anthropic.Anthropic()

    def screenshot(self) -> str:
        """Capture only the target monitor's region."""
        # Capture full screen
        subprocess.run(["scrot", "/tmp/full_screen.png"], check=True)

        # Crop to target monitor
        img = Image.open("/tmp/full_screen.png")
        region = img.crop((
            self.monitor["offset_x"],
            self.monitor["offset_y"],
            self.monitor["offset_x"] + self.monitor["width"],
            self.monitor["offset_y"] + self.monitor["height"],
        ))

        buffer = io.BytesIO()
        region.save(buffer, format="PNG")
        return base64.standard_b64encode(buffer.getvalue()).decode()

    def get_tool_config(self) -> dict:
        """Return the computer use tool configured for the target monitor."""
        return {
            "type": "computer_20241022",
            "name": "computer",
            "display_width_px": self.monitor["width"],
            "display_height_px": self.monitor["height"],
            "display_number": 0,
        }

    def translate_coordinates(self, x: int, y: int) -> tuple[int, int]:
        """Convert monitor-relative coordinates to absolute screen coordinates."""
        abs_x = x + self.monitor["offset_x"]
        abs_y = y + self.monitor["offset_y"]
        return abs_x, abs_y

    def click(self, x: int, y: int):
        """Click using monitor-relative coordinates from Claude."""
        abs_x, abs_y = self.translate_coordinates(x, y)
        subprocess.run(["xdotool", "mousemove", str(abs_x), str(abs_y)])
        subprocess.run(["xdotool", "click", "1"])

Coordinate Scaling for High-DPI Displays

High-DPI (Retina) displays add another layer of complexity. A 2x DPI display with 2560x1600 physical pixels might report as 1280x800 logical pixels. If your screenshot captures at physical resolution but Claude's tool is configured at logical resolution, coordinates will be off by 2x:

class DPIAwareController:
    def __init__(self, logical_width: int, logical_height: int, scale_factor: float = 1.0):
        self.logical_width = logical_width
        self.logical_height = logical_height
        self.scale_factor = scale_factor
        self.physical_width = int(logical_width * scale_factor)
        self.physical_height = int(logical_height * scale_factor)

    def screenshot(self) -> str:
        """Capture and resize screenshot to match logical dimensions."""
        subprocess.run(["scrot", "/tmp/screen.png"], check=True)
        img = Image.open("/tmp/screen.png")

        # Resize to logical dimensions so Claude's coordinates match
        if img.size != (self.logical_width, self.logical_height):
            img = img.resize(
                (self.logical_width, self.logical_height),
                Image.LANCZOS,
            )

        buffer = io.BytesIO()
        img.save(buffer, format="PNG")
        return base64.standard_b64encode(buffer.getvalue()).decode()

    def click(self, x: int, y: int):
        """Convert Claude's logical coordinates to physical coordinates."""
        physical_x = int(x * self.scale_factor)
        physical_y = int(y * self.scale_factor)
        subprocess.run(["xdotool", "mousemove", str(physical_x), str(physical_y)])
        subprocess.run(["xdotool", "click", "1"])

    def get_tool_config(self) -> dict:
        return {
            "type": "computer_20241022",
            "name": "computer",
            "display_width_px": self.logical_width,
            "display_height_px": self.logical_height,
            "display_number": 0,
        }

Optimal Resolution for Claude

Anthropic recommends specific resolutions for best performance. The supported resolutions and their token costs differ:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

RECOMMENDED_RESOLUTIONS = {
    "small": {"width": 1024, "height": 768, "tokens_per_screenshot": 1170},
    "medium": {"width": 1280, "height": 800, "tokens_per_screenshot": 1462},
    "large": {"width": 1920, "height": 1080, "tokens_per_screenshot": 2765},
}

def select_optimal_resolution(task_type: str) -> dict:
    """Select the best resolution based on task requirements."""
    if task_type in ("data_extraction", "reading"):
        # Higher resolution for text clarity
        return RECOMMENDED_RESOLUTIONS["large"]
    elif task_type in ("navigation", "clicking"):
        # Medium resolution balances clarity and cost
        return RECOMMENDED_RESOLUTIONS["medium"]
    elif task_type in ("simple_interaction",):
        # Small resolution for cost efficiency
        return RECOMMENDED_RESOLUTIONS["small"]
    return RECOMMENDED_RESOLUTIONS["medium"]

Virtual Display Setup

For server-side automation, use Xvfb to create a virtual display with controlled dimensions:

def setup_virtual_display(width: int = 1280, height: int = 800, display_num: int = 99):
    """Start an Xvfb virtual display."""
    subprocess.Popen([
        "Xvfb",
        f":{display_num}",
        "-screen", "0", f"{width}x{height}x24",
        "-ac",
    ])
    import time
    time.sleep(1)

    # Set the DISPLAY environment variable
    import os
    os.environ["DISPLAY"] = f":{display_num}"

    return {
        "display": f":{display_num}",
        "width": width,
        "height": height,
    }

Using a virtual display gives you full control over the resolution and eliminates variables like DPI scaling and multi-monitor configurations. This is the recommended approach for production deployments where consistency matters.

Viewport vs Screen Resolution

When automating browsers specifically, there is an important distinction between screen resolution and browser viewport size. The viewport is the area inside the browser chrome (excluding the title bar, address bar, and scrollbar). Configure Claude with the viewport dimensions, not the full screen dimensions, when capturing browser-only screenshots via Playwright:

async def configure_browser_viewport(page, target_width: int = 1280, target_height: int = 800):
    """Ensure browser viewport matches Claude's expected dimensions."""
    await page.set_viewport_size({"width": target_width, "height": target_height})

    # Verify actual viewport size
    actual = await page.evaluate(
        "() => ({width: window.innerWidth, height: window.innerHeight})"
    )
    assert actual["width"] == target_width, f"Viewport width mismatch: {actual['width']}"
    assert actual["height"] == target_height, f"Viewport height mismatch: {actual['height']}"

FAQ

What happens if I send a screenshot at the wrong resolution?

Claude will still analyze the image and return coordinates, but those coordinates will be relative to the image dimensions, not the declared display dimensions. If you declare a 1280x800 display but send a 1920x1080 screenshot, Claude's coordinates will be based on the 1920x1080 image space, causing misalignment when you try to click on the 1280x800 display.

Should I use the highest resolution possible for better accuracy?

Not necessarily. Higher resolution increases token consumption (and cost) per screenshot. For most tasks, 1280x800 provides sufficient visual clarity. Use 1920x1080 only when you need to read small text or interact with densely packed interfaces.

How do I test my coordinate translation logic?

Create a simple test that clicks on known UI elements (buttons, links) at various positions across the screen. Verify each click lands correctly by checking the screen state after clicking. Automated visual regression tests can compare before/after screenshots to confirm correct interaction.

#MultiMonitor #ResolutionHandling #ClaudeComputerUse #DisplayConfig #ViewportManagement #CoordinateScaling #DesktopAutomation

Multi-Monitor and Resolution Handling in Claude Computer Use

Why Resolution Matters

Resolution Detection and Configuration

Configuring Claude for a Specific Monitor

Coordinate Scaling for High-DPI Displays

Optimal Resolution for Claude

Virtual Display Setup

Viewport vs Screen Resolution

FAQ

What happens if I send a screenshot at the wrong resolution?

Should I use the highest resolution possible for better accuracy?

How do I test my coordinate translation logic?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding