HellaSwag is an acronym for Harder Endings, Longer contexts, and Low-shot Activities for Situations With Adversarial Generations. It is a dataset of 70,000 multiple-choice questions about grounded situations, where the model has to choose the correct ending for an incomplete narrative. The incorrect endings are adversarial generated and human-verified, so they are designed to fool machines but not humans.
Here is an example question from the HellaSwag dataset:
The correct answer is C, as it is the most plausible continuation of the scenario, while the other options are either irrelevant or nonsensical.