Claude Opus 4.6 Outperforms GPT-5.2 by 144 ELO Points on Knowledge Work Benchmark
On GDPval-AA, measuring performance on economically valuable tasks in finance, legal, and other domains, Claude Opus 4.6 beats GPT-5.2 by a significant margin.
Winning Where It Matters Most
Claude Opus 4.6 outperforms OpenAI's GPT-5.2 by approximately 144 ELO points on GDPval-AA — a benchmark that measures performance on economically valuable knowledge work tasks.
What GDPval-AA Measures
Unlike synthetic coding benchmarks, GDPval-AA evaluates AI performance on real-world professional tasks:
- Financial analysis — Building models, interpreting reports
- Legal reasoning — Contract review, case analysis
- Business strategy — Market analysis, competitive assessment
- Technical writing — Documentation, proposals
- Data analysis — Statistical interpretation, trend identification
Why It Matters
For enterprises evaluating AI models, synthetic benchmarks only tell part of the story. GDPval-AA represents the kind of work that knowledge workers actually do — and where AI creates real economic value.
A 144 ELO point difference is significant. In chess terms, this is roughly the gap between a strong amateur and a tournament player — both are good, but one consistently wins.
The Enterprise Implication
Anthropic already generates 80% of its revenue from enterprise customers. Outperforming GPT-5.2 on the benchmark that most closely mirrors enterprise knowledge work reinforces Claude's value proposition for exactly its target market.
Context
This result sits alongside Claude's strong showing on:
- SWE-bench Verified: 80.9% (first model to exceed 80%)
- OSWorld: 72.5% (approaching human-level computer use)
- ARC-AGI-2: 58.3% (4.3x improvement over previous generation)
Source: Anthropic | VentureBeat | ClaudeWorld
NYC News
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.