A set of metrics used to evaluate the quality of document translation and summarization models by measuring the overlap between a system-generated summary or translation and a set of human-created reference summaries or translations using various techniques like n-gram co-occurrence statistics, word overlap ratios, and other similarity metrics.
For instance, a ROUGE Score of 0.8 indicates that the system-generated summary has an 80% similarity with the reference summaries.