Glossary

Gsm8K

Discover GSM8K – a comprehensive dataset, consisting of 8500 linguistically varied grade school math problems, to aid in diverse learning.

Read More

MBPP

Find the MBPP dataset, featuring Python programming problems focused on code-generation models & basic programming skills. MBPP Benchmark!

Read More

HumanEval

The HumanEval dataset and pass@k metric revolutionize code generation evaluation, focusing on functional correctness. A valuable benchmark for LLMs.

Read More

TriviaQA

Introducing a comprehensive dataset for evaluating question-answering models on trivia questions, testing their knowledge retrieval across various topics.

Read More

NQ

Learn about Google’s Natural Questions benchmark, which assesses a model’s capability to comprehend and respond to real questions asked in search engine queries.

Read More