About Anthropic
Anthropic is an AI safety company founded in 2021 by Dario Amodei, Daniela Amodei, and former members of OpenAI. The company builds the Claude family of AI assistants, with a mission centered on AI safety research and developing reliable, interpretable AI systems. Anthropic pioneered Constitutional AI (CAI), a technique where AI systems are guided by a set of principles rather than purely human feedback.
The Claude model family has evolved rapidly from Claude 3 Opus (2024) through Claude Opus 4.6 (2026), consistently ranking among the top models in reasoning, knowledge, and balanced multi-domain performance. Anthropic is known for its cautious, safety-first approach to capability advancement.
Claude Model Timeline
The evolution of Anthropic’s Claude family from 2024 to 2026. All scores from official Anthropic announcements.
Claude Opus 4.6
The latest and most capable Claude model. Claude Opus 4.6 achieves 91.3% on GPQA Diamond and 91.1% on MMMLU, making it one of the highest-scoring models in both reasoning and knowledge. It also achieves 80.8% on SWE-bench Verified for real-world software engineering tasks. Source: Anthropic announcement.
Claude Opus 4.5
A major step forward in reasoning capability. Claude Opus 4.5 achieves 91.3% on GPQA Diamond—matching Opus 4.6—and 80.9% on SWE-bench Verified, the highest coding score in the Claude family at time of release.
Claude Sonnet 4.5
The balanced workhorse of the 4.5 generation. Claude Sonnet 4.5 delivers strong performance at faster response times and lower cost than Opus. It achieves 89.1% on MMLU and 83.4% on GPQA Diamond, with 77.2% on SWE-bench Verified for real-world coding tasks.
Claude Opus 4.1
An incremental Opus update building on the Claude 4 architecture with improved reliability and enhanced agentic capabilities.
Claude Opus 4
A major generational leap that introduced advanced agentic capabilities—the ability to use tools, browse documents, and execute multi-step workflows autonomously. Achieves 87.4% on MMMLU, 76.9% on GPQA Diamond (with extended thinking), and 72.5% on SWE-bench Verified. AIME 2024 score: 33.9%. Source: Anthropic announcement.
Claude Sonnet 4
The first model in the Claude 4 generation. Claude Sonnet 4 delivered substantial improvements in reasoning and agentic tasks while maintaining the fast response times Sonnet users expected. Achieves 85.4% on MMMLU, 72.3% on GPQA Diamond (with extended thinking), and 72.7% on SWE-bench Verified. Source: Anthropic announcement.
Claude Sonnet 3.7
A significant update to the 3.5 architecture. Claude Sonnet 3.7 introduced extended thinking capabilities to the Sonnet tier for the first time, allowing the model to reason through complex problems before responding.
Claude 3.5 Haiku
The speed-optimized member of the 3.5 family. Claude 3.5 Haiku was designed for high-throughput, low-latency applications where cost efficiency matters most—chatbots, classification tasks, content moderation, and real-time data extraction. Source: Anthropic model card (PDF).
Claude 3.5 Sonnet
A breakout hit that reshaped industry expectations. Claude 3.5 Sonnet demonstrated that a mid-tier model could match or exceed competitors’ flagship offerings. At 90.4% on MMLU and 92.0% on HumanEval, it outperformed GPT-4 Turbo on several key metrics while running faster and costing less. Source: Anthropic model card (PDF).
Claude 3 Opus
Anthropic’s first true frontier model and the release that established the Claude family as a serious competitor to GPT-4. Claude 3 Opus launched with a 200K context window—the largest in the industry at the time. It scored 88.2% on MMLU (5-shot CoT), 50.4% on GPQA Diamond, and 84.9% on HumanEval. Source: Anthropic model card (PDF).
Benchmark Performance
Claude Opus 4.6 scores across verified benchmark categories.
Key Strengths
Claude Opus 4.6 achieves 91.3% on GPQA Diamond, one of the highest reasoning scores among all AI models. Extended thinking mode enables deep, multi-step scientific reasoning.
With 80.8% on SWE-bench Verified, Claude Opus 4.6 demonstrates strong ability to solve real-world software engineering problems from GitHub repositories.
Built with Constitutional AI principles, Claude models are designed to be helpful, harmless, and honest. Anthropic prioritizes responsible development alongside capability advances.
About This Data
All benchmark scores are sourced from Anthropic’s official announcements and model cards. MMMLU (Multilingual MMLU) is used for Claude 4+ models where standard MMLU is not separately reported. GPQA Diamond scores for Claude 4+ include extended thinking. Scores represent performance at time of release.
Explore More Providers
Compare Anthropic’s Claude models against other frontier AI systems.