Gemini 2.5 Pro Benchmark Results

by Fede Nolasco | Oct 14, 2025

Gemini 2.5 Pro is Google DeepMind’s most capable multimodal LLM, excelling in coding, reasoning, math, and long-context comprehension. The model surpasses Gemini 1.5 Pro with over 120 Elo points on LM Arena benchmarks and features a 1M-token context, dynamic thinking, and advanced tool use.

Gemini 2.5 Pro is a sparse Mixture-of-Experts transformer trained on Google’s TPUv5p clusters with native multimodal input support (text, vision, audio, video, and code). It achieves state-of-the-art results across core benchmarks: LiveCodeBench 74.2 %, Aider Polyglot 82.2 %, GPQA 86.4 %, AIME 2025 88 %, and MMMU 82 %. Long-context retrieval achieves 99.8 % accuracy at 1 million tokens. The model supports tool use, dynamic thinking budgets, and audio-visual dialog generation. Cutoff date January 2025. Trained using reinforcement learning and human feedback for helpfulness and safety.



1M, LLM, Multimodal, Proprietary



Current



Proprietary License



Pretrained, Instruction-tuned, Reinforcement learning, Continual Learning



Gemini

Comparison

Sourced on: October 14, 2025

Gemini 2.5 Pro achieved SoTA results in reasoning, coding and multimodal tasks, ranking #2 overall on LM Arena behind GPT-4o and ahead of Claude 4 Opus. It records LiveCodeBench 74.2 %, Aider Polyglot 82.2 %, FACTS 87.8 %, and AIME 88 %. In video understanding it scored 83.6 % on VideoMMMU and 86.9 % on VideoMME. These benchmarks confirm Gemini 2.5 Pro as the most powerful Google LLM to date.

Benchmark	Gemini 2.5 Pro	OpenAI o3-high	OpenAI o4-mini	Claude 4 Sonnet	Claude 4 Opus	Grok 3 Beta	DeepSeek R1 0528
LiveCodeBench	74.2	72.0	75.8	48.9	51.1		70.5
Aider Polyglot	82.2	79.6	72.0	61.3	72.0	53.3	71.6
SWE-Bench (single attempt)	59.6	69.1	68.1	72.7	72.5
SWE-Bench (multi-attempt)	67.2			80.2	79.4		57.6
GPQA (Diamond)	86.4	83.3	81.4	75.4	79.6	80.2	81.0
Humanity’s Last Exam (no tools)	21.6	20.3	18.1	7.8	10.7		14.0
SimpleQA	54.0	48.6	19.3				27.8
FACTS Grounding	87.8	69.9	62.1	79.1	77.7	74.8	82.4
AIME 2025	88.0	88.9	92.7	70.5	75.5	77.3	87.5
LOFT (≤128K context)	87.0	77.0	60.5	81.6		73.1
LOFT (≤128K context)	69.8
MRCR-V2 (≤128K)	58.0	57.1	36.3	39.1	16.1	34.0
MRCR-V2 (1M)	16.4
MMMU (Multimodal Reasoning)	82.0	82.9	81.6	74.4	76.5	76.0

Team

Gemini 2.5 Pro was developed by the Google DeepMind Gemini Team (2025). The team integrated advances in multimodal architecture, Mixture-of-Experts training, and long-context optimization. It involved over 200 researchers across DeepMind, Brain, and Google Cloud. The team implemented TPUv5p infrastructure and new safety frameworks to enable scalable training, with dedicated sub-groups for code, multimodality, safety, and alignment.

Google DeepMind

Community

Gemini 2.5 Pro has a rapidly growing developer community through Google AI Studio and Vertex AI. It is widely integrated into Google Search, NotebookLM, and Project Astra. Community feedback is active on forums and AI Studio channels with research collaboration from academia and enterprise developers.

Active Members: 1001-5000 Members

Engagement Level: High Engagement

Resources

List of resources related to this product.

← o3-mini