Explore LLM Coding with EvalPlus benchmark

EvalPlus LLM Coding is a rigorous evaluation framework for Language Model Learning (LLM) code. It offers enhanced testing with HumanEval+ and MBPP+, providing 80x and 35x more tests than the original versions respectively. The software also includes packages, images, and tools that can easily and safely evaluate LLMs on these benchmarks.

EvalPlus, Jiawei Liu
1 to 1000 stars
March 28, 2024
EvalPlus Leaderboard on GitHub
EvalPlus GitHub Home Page