The o3 model is available in two versions:
In benchmark evaluations, o3 has demonstrated exceptional performance:
OpenAI has initiated safety testing and red teaming for o3 and o3-mini, inviting researchers to apply for early access until January 10, 2025. The public release of o3-mini is anticipated in January 2025.
OpenAI o3 shows consistent improvements over o1 in most benchmarks, particularly in reasoning-intensive and science-based tasks.
The EpochAI Frontier Math benchmark is one of the most challenging in AI, featuring unpublished, novel problems at the level of mathematical research. Typical AI systems score under 2%, reflecting its difficulty. OpenAI’s o3 achieved a groundbreaking 25.2%, demonstrating significant advancements in abstract reasoning, generalization, and problem-solving beyond rote memorization. This marks a pivotal step in advancing AI’s ability to handle complex, unfamiliar challenges.
Benchmark | OpenAI o3 | OpenAI o1 |
---|---|---|
ARC-AGI (Standard Compute) | 75.7 | N/A |
ARC-AGI (High Compute) | 87.5 | N/A |
Codeforces (Competitive Programming) - Elo Rating (converted to %) | 93 | 89 |
PhD-level Science Questions (GPQA Diamond) | 87.7 | 78.0 |
AIME (Math Olympiad Qualifier) | 94.0 | 83.3 |
MMLU (Massive Multitask Language Understanding) | 92.3 | 90.8 |
MMMU (Validation Dataset) | 78.2 | N/A |
MathVista (Test Dataset) | 73.9 | N/A |
Physics (Expert-Level) | 94.2 | 89.5 |
Chemistry (Expert-Level) | 88.7 | 65.6 |
The development of OpenAI’s o3 model was driven by a multidisciplinary team of AI researchers, engineers, and safety experts at OpenAI. This team is renowned for pioneering advancements in artificial intelligence, with a strong focus on reasoning, adaptability, and safety. The team leveraged their collective expertise in large language models, neural architecture design, and optimization techniques to create o3 and its variant, o3-mini.
Collaboration was key in this project, involving specialists in machine learning, natural language processing, mathematics, and safety testing. The team also worked closely with external red-teaming groups to ensure the model’s robustness and safety in diverse use cases. Their work was guided by OpenAI’s broader mission of developing AI that is beneficial and safe for society, with transparency and ethical considerations at the forefront.
This effort underscores OpenAI’s commitment to pushing the boundaries of AI research while addressing practical concerns such as compute efficiency, task versatility, and ethical alignment. The team’s achievements with o3 exemplify their ability to create innovative and impactful AI solutions.