Meta AI: Llama Model Benchmarks

6

Models Tracked

2023

Llama Launch

69.8

Best Reasoning

Open

Source Model

About Meta AI

Meta AI develops the Llama family of open-weight large language models. Unlike most frontier AI labs, Meta releases Llama models with open weights, allowing researchers and developers to download, modify, and deploy them freely.

The Llama family has rapidly evolved from Llama 2 (2023) through Llama 4 (2025). Llama 4 introduced the Scout and Maverick variants, competing directly with proprietary models. Meta reports MMLU-Pro and GPQA Diamond as primary benchmarks.

Llama Model Timeline

Verified scores from official Meta pages.

April 2025

Llama 4 Maverick

Meta’s most capable open model. Achieves 80.5% on MMLU-Pro and 69.8% on GPQA Diamond, competing with proprietary frontier models while remaining open-weight.

MMLU-Pro 80.5 GPQA 69.8

April 2025

Llama 4 Scout

The efficient variant of Llama 4. Achieves 74.3% on MMLU-Pro and 57.2% on GPQA Diamond.

MMLU-Pro 74.3 GPQA 57.2

September 2024

Llama 3.2 90B Vision

Meta’s first multimodal Llama model, adding image understanding capabilities.

July 2024

Llama 3.1 405B

The largest open-weight model at release. At 405 billion parameters, it proved open models could compete with proprietary systems.

April 2024

Llama 3 70B

A significant quality leap with new tokenizer and larger training dataset.

July 2023

Llama 2 70B

Meta’s first widely-released open model. Established the open-weight paradigm.

Benchmark Performance

Llama 4 Maverick scores across verified benchmark categories.

About This Data

Scores sourced from official Meta AI pages. Meta reports MMLU-Pro (not standard MMLU), which has lower absolute scores due to harder questions. Llama 4 Behemoth (82.2% MMLU-Pro, 73.7% GPQA) is announced but not yet released.

Explore More Providers

Previous: Google Back to Leaderboard Next: xAI