Skip to content
Back to Blog
AI News2 min read

Claude's Computer Use Hits 72.5% on OSWorld — Approaching Human-Level Desktop Operation

Claude Sonnet 4.6 scores 72.5% on the OSWorld benchmark for desktop computer operation, up from under 15% in late 2024, nearly matching human performance.

From 15% to 72.5% in 15 Months

Claude's ability to operate a computer like a human has improved dramatically, with Sonnet 4.6 scoring 72.5% on OSWorld — up from under 15% in late 2024. The benchmark measures an AI's ability to complete real desktop tasks.

What OSWorld Tests

OSWorld evaluates whether an AI can:

  • Navigate complex spreadsheets
  • Complete web forms
  • Switch between applications
  • Follow multi-step instructions
  • Handle unexpected dialog boxes and errors

A score of 72.5% means Claude can successfully complete nearly three-quarters of these real-world desktop tasks — approaching the level of a competent human operator.

How They Got Here

Two key factors drove the improvement:

  1. Model training improvements in the 4.6 generation focused on spatial understanding and interaction patterns
  2. Vercept acquisition — the desktop AI startup whose team and technology now contribute directly to Claude's computer use capabilities

Comparison Across Models

Model OSWorld Score
Claude Sonnet 4.6 72.5%
Claude Opus 4.6 72.7%
Previous generation ~50%
Late 2024 <15%

Practical Implications

At this performance level, Claude can realistically automate routine desktop work: data entry, form filling, report generation, and application navigation. The gap between "demo impressive" and "production useful" has closed.

Source: Anthropic | NxCode | DataCamp | Natural 20

Share this article
N

NYC News

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.