Mistral AI: Model Benchmarks

4

Models Tracked

2023

Founded

85.5

Best Knowledge

92.0

Best Coding

About Mistral AI

Mistral AI is a French AI company founded in 2023 by former researchers from Google DeepMind and Meta. Based in Paris, Mistral has quickly become one of Europe’s leading AI companies, known for its open-weight models and efficient Mixture-of-Experts (MoE) architectures.

Mistral pioneered the open-weight MoE approach with Mixtral 8x7B, demonstrating that smaller expert networks activated selectively can match or exceed much larger dense models. Their latest model, Mistral Large 3, continues this tradition with competitive performance at lower computational cost.

Mistral Model Timeline

The evolution of Mistral’s model family. Verified scores from official Mistral announcements.

December 2025

Mistral Large 3

Mistral’s most capable model. Mistral Large 3 achieves 85.5% on MMLU and approximately 43.9% on GPQA Diamond. Built with a Mixture-of-Experts architecture, it offers strong performance with efficient inference.

MMLU 85.5 GPQA 43.9

July 2024

Mistral Large 2

A 123B parameter dense model that marked Mistral’s push into frontier-class performance. Achieved 84.0% on MMLU and 92.0% on HumanEval, with strong multilingual capabilities across dozens of languages.

MMLU 84.0 HumanEval 92.0

February 2024

Mistral Large

Mistral’s first proprietary large model. Positioned as a premium offering with strong reasoning and instruction-following capabilities, and built-in support for function calling.

December 2023

Mixtral 8x7B

The model that put Mistral on the map. Mixtral 8x7B used a Sparse Mixture-of-Experts architecture with 8 expert networks (each 7B parameters), activating only 2 per token. This design matched GPT-3.5 performance at a fraction of the inference cost and was released as open weights.

Benchmark Performance

Mistral Large 3 scores across verified benchmark categories.

Key Strengths

MoE Architecture Pioneer

Mistral popularized Mixture-of-Experts in production LLMs with Mixtral 8x7B, demonstrating that selective expert activation delivers frontier performance at lower inference cost.

Strong Coding Performance

Mistral Large 2 achieved 92.0% on HumanEval, placing it among the top models for code generation across multiple programming languages.

European AI Leadership

As Europe’s leading AI company, Mistral offers strong multilingual capabilities and has been a vocal advocate for open-weight model releases.

About This Data

Benchmark scores are sourced from official Mistral AI announcements and blog posts. Mistral Large 2 and Large 3 have verified scores; older models are listed for historical context. Some scores (e.g., GPQA Diamond for Large 3) are approximate based on third-party evaluations.

Explore More Providers

Compare Mistral’s models against other frontier AI systems.

Previous: DeepSeek Back to Leaderboard Next: Alibaba Cloud