DROP(f1)
DROP is a benchmark that tests AI systems’ ability to perform discrete reasoning over paragraphs. It features 96k crowdsourced questions.
Read MoreLarge language models have billions of parameters that enable them to understand and generate natural language texts. Learn more about them here.
Read MorePrecision is the ratio of true positives to all positive predictions. It shows how accurate your model is on the positive class.
Read MoreBase merges and moerges combine LLMs into one model. It’s a novel and effective technique for creating top models on the Open LLM Leaderboard
Read More