Smaug-72B-v0.1

by | Feb 28, 2024

Smaug-72B-v0.1 is a cutting-edge language model that leverages a novel fine-tuning technique known as DPOP. This technique allows it to achieve top scores on a variety of benchmarks, making it a highly effective tool for a wide range of natural language processing tasks. Its versatility and performance make it an excellent choice for those seeking state-of-the-art solutions in the field of AI and machine learning.

Smaug-72B-v0.1 is a groundbreaking language model that has achieved first place on the Open LLM Leaderboard by HuggingFace, surpassing an average score of 80%. It utilizes a novel fine-tuning technique known as DPO-Positive (DPOP) that fixes the failure modes of preference optimization with DPO, and outperforms DPO across a wide variety of datasets and downstream tasks. The model?s scores on various datasets, such as ARC, HellaSwag, MMLU, TruthfulQA, Winogrande, and GSM8K, as well as its performance on MT-Bench with the llama-2 conversation template, are impressive.

The model also provides contamination scores on ARC, TruthfulQA, and GSM8K, compared to MoMo-72B-lora-1.8.7-DPO and Llama-2-70B. Some examples of the model’s responses to different questions, such as outlining a blog post, solving a probability problem, implementing a program, and identifying named entities, are provided. The model is too large to load onto the free Inference API, but it is capable of generating text. It is finetuned from moreh/MoMo-72B-lora-1.8.7-DPO and is used in spaces such as arunima/Smaug-Chatbot, bvencel/venci-test, alexkueck/LIRAGTBackup, etc. The model size and tensor type of the model are provided under Safetensors.

Current
Non-commercial use only
Fine-tuned

Comparison 

Sourced on: January 1, 1970

Smaug-72B-v0.1 is a new text generation model that has achieved the highest score on the Open LLM Leaderboard by HuggingFace. It is finetuned from moreh/MoMo-72B-lora-1.8.7-DPO, which is based on Qwen-72B2. It uses a new fine-tuning technique called DPO-Positive (DPOP) and new pairwise preference versions of ARC, HellaSwag, and MetaMath datasets, as well as other existing datasets. It introduces a new loss function and training procedure that avoids the failure mode of standard DPO loss, and outperforms DPO across a wide variety of datasets and downstream tasks.

MoMo-72B-lora-1.8.7-DPO is another text generation model that is finetuned from Qwen-72B2. It uses the standard DPO loss and the original versions of ARC, HellaSwag, and MetaMath datasets, as well as other existing datasets. It has a lower score than Smaug-72B-v0.1 on the Open LLM Leaderboard, and suffers from a reduction of the model’s likelihood of the preferred examples on some datasets. It also has slightly higher contamination scores than Smaug-72B-v0.1 on some datasets.

BenchmarkGPT-3.5Gemini ProMistral - SmallMistral-Medium Smaug-72B
MMLU70.071.870.675.377.2
HellaSwag85.584.786.788.989.3
Arc85.285.889.976.0
WinoGrade81.681.288.085.1
GSM-8K 57.158.466.778.7
Truthful QA76.7

Team 

Abacus.AI is a company that’s all about making artificial intelligence (AI) work for various applications. They’ve built a platform that uses advanced AI models to create systems that can do things like understanding and generating language, making predictions, and personalizing experiences. This platform also gives data science teams the tools they need to manage data, build features, and keep an eye on their models.

The team at Abacus.AI is pretty impressive. They’re a group of AI scientists and machine learning engineers who have studied at top universities like Stanford, MIT, and UC Berkeley. They’ve also worked at big tech companies like Google, AWS, and Uber. Now, they’re using their skills to push the boundaries of what AI can do at Abacus.AI. They’ve already caught the attention of several Fortune 500 companies and have some high-profile investors backing them. So, it’s safe to say they’re doing something right!