PIQA stands for Physical Interaction: Question Answering. It is a dataset of 16,000 multiple-choice questions about everyday physical tasks, requiring the model to choose the most appropriate action in different scenarios. The questions are generated by an adversarial filtering method that ensures that simple heuristics or word associations are not enough to solve them
PIQA is a benchmark for testing the commonsense reasoning ability of large language models (LLMs), such as BERT, GPT-3, and RoBERTa. It can be used to evaluate how well these models can understand natural language and perform physical tasks in various domains, such as cooking, cleaning, gardening, etc. PIQA can also be used to encourage the development of more robust and generalizable models that can handle diverse and challenging scenarios