ARC benchmark (AI2 Reasoning Challenge) Easy (ARC-e) and Challenge (ARC-c): These are two sets of questions from the ARC dataset, designed to evaluate a system’s ability to reason and use knowledge. ARC-e contains easier questions, while ARC-c has more challenging ones, typically requiring more complex reasoning or knowledge.