Humaneval Benchmark

A dataset designed to evaluate the code generation capabilities of large language models (LLMs).

Humaneval Benchmark

Areas of application

  • Language understanding
  • Algorithms
  • Simple mathematics
  • Software interview questions

Example

The HumanEval benchmark consists of 164 hand-crafted programming challenges, each including a function signature, docstring, body, and several unit tests.