Mixtral 8X7B

by Fede Nolasco | Jan 11, 2024

In the groundbreaking research "Mixtral of Experts," authored by Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, and a distinguished team of AI experts, the Mixtral 8x7B is introduced.

This novel Sparse Mixture of Experts (SMoE) language model, akin to Mistral 7B in architecture, boasts 8 feedforward blocks (experts) in each layer. Remarkably, each token, during its processing, engages with two experts, allowing access to 47B parameters while actively utilizing only 13B. Trained with a 32k token context size, Mixtral surpasses the performance of prominent models like Llama 2 70B and GPT-3.5 across multiple benchmarks.



8x7B, LLM



Current



Apache License



Pretrained



Mixtral

Comparison

Sourced on: January 11, 2024

Mixtral 8x7B demonstrates exceptional prowess in mathematics, code generation, and multilingual tasks, significantly outshining Llama 2 70B. Its “Instruct” version, designed for following instructions, eclipses competitors like GPT-3.5 Turbo, Claude-2.1, and Gemini Pro in human evaluation benchmarks.

This table reflects the comparison of Mixtral with Llama, where Mixtral either outperforms or matches the Llama 2 70B model on almost all popular benchmarks while using significantly fewer active parameters during inference.

Note that Mixtral 8x7B has 13B active params.

Benchmark	LLaMA 2 7B	LLaMA 2 13B	LLaMA 1 33B	LLaMA 2 70B	Mistral 7B	Mixtral 8x7B
MMLU	44.40%	55.60%	56.80%	69.90%	62.50%	70.60%
HellaSwag	77.10%	80.70%	83.70%	85.40%	81.00%	84.40%
WinoGrande	69.50%	72.90%	76.20%	80.40%	74.20%	77.20%
PIQA	77.90%	80.80%	82.20%	82.60%	82.20%	83.60%
Arc-e	68.70%	75.20%	79.60%	79.90%	80.50%	83.10%
Arc-c	43.20%	48.80%	54.40%	56.50%	54.90%	59.70%
NQ	17.50%	16.70%	24.10%	25.40%	23.20%	30.60%
TriviaQA	56.60%	64.00%	68.50%	73.00%	62.50%	71.50%
HumanEval	11.60%	18.90%	25.00%	29.30%	26.20%	40.20%
MBPP	26.10%	35.40%	40.90%	49.80%	50.20%	60.70%
Math	3.90%	6.00%	8.40%	13.80%	12.70%	28.40%
GSM8K	16.00%	34.30%	44.10%	69.60%	50.00%	74.40%

Team

The team, comprising Albert Q. Jiang, Alexandre Sablayrolles, and their colleagues, demonstrates a commitment to advancing AI. Their collaborative effort hints at further investments in refining Mixtral, aiming to leverage its sparse parameter utilization for broader applications.

Community

The Mixtral community is burgeoning, with active engagement and contributions. Key community links include the project’s Huggingface model page with over 116K downloads.

Active Members: 100,001+ Members

Engagement Level: High Engagement

Resources

List of resources related to this product.

← Phi-2 Nous Hermes 2 Mixtral 8x7B →