In Matthew Berman’s insightful video, “ARC AGI 3 just dropped, what it means for AGI,” published on March 27, 2026, he delves into the intriguing development of the ARC AGI benchmarks, culminating in the third iteration. The ARC AGI series is a benchmark designed to test the capacity for General Artificial Intelligence (AGI), emphasizing challenges in generalization that separates human cognition from current AI capabilities. His analysis begins by simplifying initial iterations like ARC AGI 1 and 2, which focus on discernible patterns that are straightforward for humans but remain complex for AI. Impressively, Berman explains how ARC AGI 1, with its clear task of pattern recognition, exhibited that while humans can achieve 100% success rates, AI struggled to a lesser degree, maxing out at approximately 93-94% success with considerable costs per task involved. In contrast, ARC AGI 2 became more complex yet still saw humans easily outperform sophisticated AI like GPT 5.4 Pro, which only managed a 72% success rate, costing considerably more per task.
However, the show-stealer here is ARC AGI 3, the first interactive benchmark, wherein AI models are tested within a video game-like virtual environment, further showcasing human cognitive flexibility over AI prowess. Berman even demonstrates a gameplay scenario wherein human intuition and logical deduction led him to swiftly solve the puzzle; a task on which all AI models, including the top performer GPT 5.4, failed drastically, scoring close to 0%. Despite AI’s limitations, the presentation is effective in illustrating that human-like general intelligence remains a distant milestone for AI.