In a revealing look into the future of artificial intelligence, the AI Explained YouTube channel recently covered the unfolding developments surrounding two new models from OpenAI and Anthropic, as well as the innovative ARC-AGI-3 benchmark. As of March 2026, OpenAI has paused its secondary projects, diverting resources towards the new Spud model—a strategic move reminiscent of eliminating detours on a critical journey. Meanwhile, Anthropic’s Claude is drawing fresh interest from the Pentagon, indicating a renewed governmental focus on harnessing the power of AI for national security.
Amidst the allure of these cutting-edge models is the introduction of the ARC-AGI-3 benchmark, which starkly highlights the formidable gap between machine learning and human cognition. According to benchmarks, cutting-edge AI systems achieve less than one percent of the capabilities humans demonstrate on similar tasks. This disparity indicates that current AI models have not yet reached the status of artificial general intelligence (AGI), despite Nvidia CEO Jensen Huang’s recent comments to the contrary.
On one hand, the benchmarks like ARC-AGI-3 display rigorous evaluation methods. They abstain from relying on simplified tests, featuring abstract, game-like challenges that demand meta-level reasoning beyond current AI. The flexibility and inventiveness of the tasks render traditional tactics ineffective, promoting a shift toward authentic intelligence assessments. The promise these benchmarks hold for pushing AI boundaries is compelling.
However, there are questions regarding the realism of these benchmarks in assessing AGI, given they are based on how efficiently actions are performed rather than the actual difficulty of the task. The artificial constraints and narrow focus might not entirely capture the well-rounded intelligence its creators hope to measure. Furthermore, criticisms point out that the emphasis on solely action efficiency disregards other factors, potentially leading to misleading evaluations of AI progress.
Despite these controversies, the storyline remains enthralling. OpenAI’s strategic redirection towards a fully automated AI researcher demonstrates their ambitious vision of AI that transcends current capabilities. Similarly, Anthropic’s bold proposals for redefining cyber capabilities could provoke pivotal discussions on AI’s role in future government operations. As we delve deeper into 2026, these developments promise to keep the AI community and general public at the edge of their seats with both excitement and caution for what lies ahead.