DROP(f1)

The DROP (Discrete Reasoning Over Paragraphs) benchmark is a crowdsourced, adversarially-created, 96k-question benchmark. It requires a system to resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets.

DROP(f1)

Areas of application

The DROP benchmark is primarily used in the following areas:

  • Evaluating the discrete reasoning abilities of AI systems.
  • Testing the systems’ ability to resolve references in questions and perform discrete operations over them.
  • Assessing the performance of these systems in understanding the content of paragraphs.

Example

The DROP benchmark is tested by presenting the questions to the AI systems and evaluating their ability to resolve references in the questions and perform discrete operations over them. The performance of the systems is then measured based on their accuracy in answering these questions.