← Chatbot Arena: A Benchmarking for Large Language Models Text-to-SQL Evaluation with BIRD Dataset →

Explore LLM Coding with EvalPlus benchmark

by Fede Nolasco | Mar 28, 2024

 LLM Management | TLDR | Write and Run Code

Explore LLM Coding with EvalPlus benchmark

EvalPlus LLM Coding is a rigorous evaluation framework for Language Model Learning (LLM) code. It offers enhanced testing with HumanEval+ and MBPP+, providing 80x and 35x more tests than the original versions respectively. The software also includes packages, images, and tools that can easily and safely evaluate LLMs on these benchmarks.

 EvalPlus, Jiawei Liu

 1 to 1000 stars

 March 28, 2024

 EvalPlus Leaderboard on GitHub

 EvalPlus GitHub Home Page

← Chatbot Arena: A Benchmarking for Large Language Models Text-to-SQL Evaluation with BIRD Dataset →