Skip to content
Back to Blog
AI News2 min read

Claude Opus 4.6 Outperforms GPT-5.2 by 144 ELO Points on Knowledge Work Benchmark

On GDPval-AA, measuring performance on economically valuable tasks in finance, legal, and other domains, Claude Opus 4.6 beats GPT-5.2 by a significant margin.

Winning Where It Matters Most

Claude Opus 4.6 outperforms OpenAI's GPT-5.2 by approximately 144 ELO points on GDPval-AA — a benchmark that measures performance on economically valuable knowledge work tasks.

What GDPval-AA Measures

Unlike synthetic coding benchmarks, GDPval-AA evaluates AI performance on real-world professional tasks:

  • Financial analysis — Building models, interpreting reports
  • Legal reasoning — Contract review, case analysis
  • Business strategy — Market analysis, competitive assessment
  • Technical writing — Documentation, proposals
  • Data analysis — Statistical interpretation, trend identification

Why It Matters

For enterprises evaluating AI models, synthetic benchmarks only tell part of the story. GDPval-AA represents the kind of work that knowledge workers actually do — and where AI creates real economic value.

A 144 ELO point difference is significant. In chess terms, this is roughly the gap between a strong amateur and a tournament player — both are good, but one consistently wins.

The Enterprise Implication

Anthropic already generates 80% of its revenue from enterprise customers. Outperforming GPT-5.2 on the benchmark that most closely mirrors enterprise knowledge work reinforces Claude's value proposition for exactly its target market.

Context

This result sits alongside Claude's strong showing on:

  • SWE-bench Verified: 80.9% (first model to exceed 80%)
  • OSWorld: 72.5% (approaching human-level computer use)
  • ARC-AGI-2: 58.3% (4.3x improvement over previous generation)

Source: Anthropic | VentureBeat | ClaudeWorld

Share this article
N

NYC News

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.