Mixtral 8X7B

by | Jan 11, 2024

In the groundbreaking research "Mixtral of Experts," authored by Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, and a distinguished team of AI experts, the Mixtral 8x7B is introduced.

This novel Sparse Mixture of Experts (SMoE) language model, akin to Mistral 7B in architecture, boasts 8 feedforward blocks (experts) in each layer. Remarkably, each token, during its processing, engages with two experts, allowing access to 47B parameters while actively utilizing only 13B. Trained with a 32k token context size, Mixtral surpasses the performance of prominent models like Llama 2 70B and GPT-3.5 across multiple benchmarks.

Current
Apache License
Pretrained

Comparison 

Sourced on: January 11, 2024

Mixtral 8x7B demonstrates exceptional prowess in mathematics, code generation, and multilingual tasks, significantly outshining Llama 2 70B. Its “Instruct” version, designed for following instructions, eclipses competitors like GPT-3.5 Turbo, Claude-2.1, and Gemini Pro in human evaluation benchmarks.

This table reflects the comparison of Mixtral with Llama, where Mixtral either outperforms or matches the Llama 2 70B model on almost all popular benchmarks while using significantly fewer active parameters during inference.

Note that Mixtral 8x7B has 13B active params.

BenchmarkLLaMA 2 7BLLaMA 2 13BLLaMA 1 33BLLaMA 2 70BMistral 7BMixtral 8x7B
MMLU44.40%55.60%56.80%69.90%62.50%70.60%
HellaSwag77.10%80.70%83.70%85.40%81.00%84.40%
WinoGrande69.50%72.90%76.20%80.40%74.20%77.20%
PIQA77.90%80.80%82.20%82.60%82.20%83.60%
Arc-e68.70%75.20%79.60%79.90%80.50%83.10%
Arc-c43.20%48.80%54.40%56.50%54.90%59.70%
NQ17.50%16.70%24.10%25.40%23.20%30.60%
TriviaQA56.60%64.00%68.50%73.00%62.50%71.50%
HumanEval11.60%18.90%25.00%29.30%26.20%40.20%
MBPP26.10%35.40%40.90%49.80%50.20%60.70%
Math3.90%6.00%8.40%13.80%12.70%28.40%
GSM8K16.00%34.30%44.10%69.60%50.00%74.40%

Team 

The team, comprising Albert Q. Jiang, Alexandre Sablayrolles, and their colleagues, demonstrates a commitment to advancing AI. Their collaborative effort hints at further investments in refining Mixtral, aiming to leverage its sparse parameter utilization for broader applications.

Community 

The Mixtral community is burgeoning, with active engagement and contributions. Key community links include the project’s Huggingface model page with over 116K downloads.

Active Members: 100,001+ Members
Engagement Level: High Engagement