In this video, Brillibits discusses the new Mixtral model released by Mistral AI, which is an 8x7B mixture of experts (MOE) model that outperforms Llama 70B while being significantly faster. The model activates only two expert models at a time, resulting in roughly 7 billion parameters being activated in a forward pass for each token. Brillibits provides a detailed overview of the model and explains how to fine-tune it on custom datasets. The hardware requirements for fine-tuning include roughly 48GB of VRAM (two RTX 3090s or RTX 4090s) and at least 32GB of RAM. The video covers the process of creating an instruct dataset using the Dolly 15K dataset and the format of the instruct model. Brillibits walks through the fine-tuning process using the Finetune_LLMs software, highlighting important flags and options. The performance characteristics of the fine-tuned model are discussed, and a demonstration of using the text generation inference to get results is provided. Brillibits also shares thoughts on the future of mixture of experts models and the potential for enhancing the model by selecting more experts at a time. The video concludes with a call to action for viewers to like, subscribe, and join the Discord community for further discussions.

Brillibits
Not Applicable
July 7, 2024
Finetune LLMs GitHub
PT22M35S