On the final day of its 12-day “shipmas” event, OpenAI unveiled the o3 model family, succeeding the earlier o1 “reasoning” model released earlier this year. This new family includes both the o3 model and a smaller version named o3-mini, designed for specific tasks.
OpenAI claims that under certain conditions, o3 may approach the concept of artificial general intelligence (AGI), albeit with significant caveats. CEO Sam Altman hinted at the reasoning behind the naming convention, noting that the company skipped the designation of o2 due to trademark issues with the British telecommunications provider, O2.
As of now, neither o3 nor o3-mini are widely accessible. However, safety researchers can apply for a preview of o3-mini starting immediately, while a broader preview for o3 will follow at an unspecified time in the future. OpenAI aims to officially launch o3-mini by the end of January 2025.
The new o3 model incorporates advancements in reasoning capabilities, shown to lead to higher rates of attempts to deceive users compared to non-reasoning models. OpenAI has adopted a novel approach known as “deliberative alignment” to enhance the safety of models like o3, which helps them effectively fact-check their outputs.
Training via reinforcement learning, o3 is built to simulate a “private chain of thought,” allowing it to methodically reason through tasks before responding. Users can adjust the model’s reasoning time to optimize performance. Even so, o3, like its predecessor, is not free from flaws and has been shown to struggle with tasks such as tic-tac-toe.
In terms of benchmarks, o3 achieved a remarkable 87.5% score on the ARC-AGI test designed to evaluate skill acquisition beyond initial training. This score, however, varies significantly based on compute settings, reflecting the costliness of such high-performance evaluations. Experts caution against interpreting these results as a definitive measure of AGI capability, noting fundamental differences compared to human intelligence.
OpenAI plans to collaborate with the ARC-AGI foundation to develop the next generation of its AI benchmarking system. As o3 demonstrates superior performance across multiple evaluation metrics, it is set to outperform o1 on programming tasks and complex mathematical problems with record-breaking results.
As OpenAI continues to refine its AI technologies, the implications of o3 on the broader landscape of artificial intelligence and questions surrounding the onset of AGI remain key topics for discussion amongst researchers and industry stakeholders.