About DeepSeek
DeepSeek is a Chinese AI research lab that has made waves with its open-source, high-performance language models. Founded in 2023, DeepSeek has rapidly produced models that compete with and sometimes exceed Western frontier labs on key benchmarks, while using novel training efficiency techniques.
DeepSeek-R1 (2025) introduced a reasoning model that rivals OpenAI’s o1, achieving 90.8% on MMLU and 79.8% on AIME 2024. The company’s V3 base model also demonstrated strong performance with 88.5% on MMLU.
DeepSeek Model Timeline
Verified scores from DeepSeek-R1 paper and V3 technical report.
DeepSeek-R1
DeepSeek’s reasoning model, rivaling OpenAI o1. Achieves 90.8% on MMLU, 84.0% on MMLU-Pro, 71.5% on GPQA Diamond, and 79.8% on AIME 2024. Also scores 97.3% on MATH-500.
DeepSeek-V3
A strong general-purpose model. Achieves 88.5% on MMLU, 75.9% on MMLU-Pro, and 59.1% on GPQA Diamond. Trained with novel efficiency techniques that significantly reduce compute costs.
DeepSeek-V2.5
An incremental update to V2 with improved instruction following and coding capabilities.
DeepSeek-V2
DeepSeek’s second-generation model, introducing mixture-of-experts architecture for efficient scaling.
Benchmark Performance
DeepSeek-R1 scores across verified benchmark categories.
About This Data
Scores sourced from DeepSeek’s peer-reviewed arxiv papers. The R1 paper also provides cross-reference scores for GPT-4o, Claude 3.5, and o1, which are used to verify those providers’ data.