The o1 model series includes:
Developers have utilized o1 to build applications that streamline customer support, optimize supply chain decisions, and forecast complex financial trends. Key features of o1 include function calling, which allows seamless connection to external data and APIs, and structured outputs that generate responses adhering to custom JSON schemas.
Access to o1 is available through OpenAI’s API for users on paid usage tiers. In ChatGPT, Plus and Team users can select o1 or o1-mini in the model selector. Usage limits apply, with o1-mini offering 50 messages per day and o1-preview providing 50 messages per week.
OpenAI’s o1 model, introduced in September 2024, is designed to enhance reasoning capabilities, particularly in complex tasks such as mathematics, coding, and scientific problem-solving. It employs reinforcement learning techniques to generate internal chains of thought before responding, enabling it to handle intricate multi-step tasks with improved accuracy.
Benchmark | OpenAI o1-preview | OpenAI o1-mini | GPT-4o |
---|---|---|---|
Competition Math (AIME 2024) - Consensus@64 | 83.3 | 56.7 | 13.4 |
Competition Math (AIME 2024) - Pass@1 | 74.4 | 44.6 | 9.3 |
Competition Code (Codeforces) - Elo Rating | 1673 | 1258 | 808 |
Competition Code (Codeforces) - Percentile | 89.0 | 62.0 | 11.0 |
GPQA Diamond - Consensus@64 | 78.0 | 78.3 | 56.1 |
GPQA Diamond - Pass@1 | 77.3 | 73.3 | 50.6 |
Physics - Consensus@64 | 94.2 | 89.5 | 68.6 |
Physics - Pass@1 | 92.8 | 89.4 | 59.5 |
MATH Benchmark - Pass@1 | 94.8 | 85.5 | 60.3 |
MMLU - Pass@1 | 92.3 | 90.8 | 88.0 |
MMMU (val) - Pass@1 | 78.2 | N/A | 69.1 |
MathVista (testmini) - Pass@1 | 73.9 | N/A | 63.8 |
Chemistry - Consensus@64 | 65.6 | 60.2 | 43.0 |
Chemistry - Pass@1 | 64.7 | 59.9 | 40.2 |
The team that developed the o1 model at OpenAI consisted of a multidisciplinary group of researchers, engineers, and product specialists. The core team included AI researchers specializing in natural language processing (NLP) and reasoning systems, as well as software engineers who optimized model architecture for performance and efficiency. A dedicated group of data scientists curated high-quality datasets, ensuring the model’s capability in reasoning, STEM, and coding domains. Additionally, ethical AI specialists and compliance experts contributed to the alignment of the model with OpenAI’s values for responsible AI use.
The collaboration involved extensive testing and iteration to enhance accuracy, reliability, and structured output capabilities. The team utilized feedback from diverse user groups, including developers and domain experts, to fine-tune the model’s performance. This collective effort highlights OpenAI’s commitment to advancing AI while addressing real-world challenges with precision and ethical considerations.