Glossary

ARC

Discover ARC Benchmark: AI2 Reasoning Challenge includes two sets of questions: ARC‑e & ARC‑c. Evaluate reasoning knowledge.

Read More

PIQA

Discover PIQA, a benchmark that evaluates a model’s comprehension of physical interactions. It challenges models to choose appropriate actions in various scenarios. Sources included.

Read More

WinoGrande

Discover the Winogrande dataset, designed for AI model training and testing in common sense reasoning. Evaluate AI’s language understanding with its challenging problems and expanded scale.

Read More

HellaSwag

Discover how the HellaSwag benchmark assesses a model’s common sense reasoning ability by presenting scenarios that test implicit knowledge about the world.

Read More