Winning Where It Matters Most

Claude Opus 4.6 outperforms OpenAI's GPT-5.2 by approximately 144 ELO points on GDPval-AA — a benchmark that measures performance on economically valuable knowledge work tasks.

What GDPval-AA Measures

Unlike synthetic coding benchmarks, GDPval-AA evaluates AI performance on real-world professional tasks:

Financial analysis — Building models, interpreting reports
Legal reasoning — Contract review, case analysis
Business strategy — Market analysis, competitive assessment
Technical writing — Documentation, proposals
Data analysis — Statistical interpretation, trend identification

Why It Matters

For enterprises evaluating AI models, synthetic benchmarks only tell part of the story. GDPval-AA represents the kind of work that knowledge workers actually do — and where AI creates real economic value.

A 144 ELO point difference is significant. In chess terms, this is roughly the gap between a strong amateur and a tournament player — both are good, but one consistently wins.

The Enterprise Implication

Anthropic already generates 80% of its revenue from enterprise customers. Outperforming GPT-5.2 on the benchmark that most closely mirrors enterprise knowledge work reinforces Claude's value proposition for exactly its target market.

Context

This result sits alongside Claude's strong showing on:

SWE-bench Verified: 80.9% (first model to exceed 80%)
OSWorld: 72.5% (approaching human-level computer use)
ARC-AGI-2: 58.3% (4.3x improvement over previous generation)

Source: Anthropic | VentureBeat | ClaudeWorld

Claude Opus 4.6 Outperforms GPT-5.2 by 144 ELO Points on Knowledge Work Benchmark

Winning Where It Matters Most

What GDPval-AA Measures

Why It Matters

The Enterprise Implication

Context

Try CallSphere AI Voice Agents

Related Articles

QuitGPT Movement Plans In-Person Protest at OpenAI HQ as 1.5 Million Take Action

'Cancel ChatGPT' Movement Goes Viral as Users Flee to Claude Over Pentagon Deal

Claude Launches Memory Import: Switch from ChatGPT Without Losing Your Data