Glossary

HumanEval

Posted by Fede Nolasco | Dec 22, 2023

The HumanEval dataset and pass@k metric revolutionize code generation evaluation, focusing on functional correctness. A valuable benchmark for LLMs.

TriviaQA

Posted by Fede Nolasco | Dec 22, 2023

Introducing a comprehensive dataset for evaluating question-answering models on trivia questions, testing their knowledge retrieval across various topics.

NQ

Posted by Fede Nolasco | Dec 22, 2023

Learn about Google’s Natural Questions benchmark, which assesses a model’s capability to comprehend and respond to real questions asked in search engine queries.