Arc-e
ARC-e is an enhanced version of the ARC Benchmark, evaluating large language models’ reasoning abilities. With 1,169 challenging questions, no model has reached a 75% score yet.
Read MoreIntroducing MATH benchmark: A comprehensive collection of mathematics problems to test model’s ability in understanding and solving various math questions.
Read More