Llama 3 8K

by Fede Nolasco | May 7, 2024

Meta’s Llama 3 is a family of large language models with 8B and 70B variants, designed for dialogue and various text generation tasks. It emphasizes safety and helpfulness, utilizing advanced architectures and fine-tuning techniques. Aimed for English use in commercial and research settings, it’s a static model with a focus on responsible AI development and community engagement for continuous improvement.

Meta has recently unveiled the Meta Llama 3 family of large language models (LLMs), which includes both 8B and 70B parameter versions. These models are designed for generative text and code production, with a focus on dialogue applications. The Llama 3 models are optimized for helpfulness and safety, outperforming many open-source chat models on industry benchmarks.

The Llama 3 models are built using an optimized transformer architecture and are fine-tuned with supervised learning and reinforcement learning to align with human preferences. They are pretrained on over 15 trillion tokens from publicly available sources and further refined with over 10 million human-annotated examples. Notably, no Meta user data is included in the training datasets.

Meta’s commitment to responsible AI development is evident in the Llama 3 models. They provide developers with a Responsible Use Guide and tools like Meta Llama Guard 2 and Code Shield to implement safety best practices. These resources help reduce residual risks while maintaining high levels of helpfulness.

The Llama 3 models are intended for commercial and research use in English, with the instruction-tuned variants specifically designed for assistant-like chat applications. Developers are encouraged to fine-tune the models for other languages, provided they comply with the Llama 3 Community License and Acceptable Use Policy.

In summary, the Llama 3 models represent a significant advancement in AI technology, offering powerful capabilities for natural language generation tasks while emphasizing safety and responsible use.



70B, 8B, LLM



Current



Community License



Pretrained, Instruction-tuned



Llama 3, opensource

Comparison

Sourced on: May 7, 2024

General Performance

MMLU (5-shot): Llama 3 8B shows a significant improvement over Llama2 7B, with a score of 66.60 compared to 45.70. Llama 3 70B outperforms its Llama2 counterpart with 79.50 over 69.70.
AGIEval English (3-5 shot): Llama 3 8B scores 45.90, a notable leap from Llama2 7B’s 28.80. The 70B models see Llama 3 achieving 63.00, surpassing Llama2’s 54.80.
CommonSenseQA (7-shot): A clear advancement is seen with Llama 3 8B scoring 72.60 against Llama2 7B’s 57.60. The 70B models show Llama 3 at 83.80, ahead of Llama2’s 78.70.
Winogrande (5-shot): Llama 3 8B’s 76.10 is a slight increase over Llama2 7B’s 73.30. Llama 3 70B’s 83.10 is also an improvement over Llama2 70B’s 81.80.
BIG-Bench Hard (3-shot, CoT): Llama 3 8B reaches 61.10, compared to Llama2 7B’s 38.10. Llama 3 70B significantly outperforms Llama2 70B with 81.30 over 65.70.
ARC-Challenge (25-shot): Llama 3 8B achieves 78.60, a substantial gain from Llama2 7B’s 53.70. Llama 3 70B excels with 93.00, compared to Llama2 70B’s 85.30.

Knowledge Reasoning and Reading Comprehension

TriviaQA-Wiki (5-shot): Llama 3 8B’s 78.50 is competitive with Llama2 13B’s 79.60, while Llama 3 70B’s 89.70 surpasses Llama2 70B’s 87.50.
SQuAD (1-shot): Llama 3 8B scores 76.40, showing a marginal improvement over both Llama2 7B and 13B. Llama 3 70B’s 85.60 is higher than Llama2 70B’s 82.60.
QuAC (1-shot, F1): Llama 3 8B’s 44.40 is slightly better than Llama2 7B’s 39.60 and comparable to Llama2 13B’s 44.90. Llama 3 70B’s 51.10 is an improvement over Llama2 70B’s 49.40.
BoolQ (0-shot): Llama 3 8B achieves 75.70, outperforming Llama2 7B’s 65.50. Llama 3 70B scores 79.00, better than Llama2 70B’s 73.10.
DROP (3-shot, F1): Llama 3 8B’s 58.40 shows a significant increase from Llama2 7B’s 37.90. Llama 3 70B’s 79.70 is a notable improvement over Llama2 70B’s 70.20.

Highlights:

Llama 3 models consistently outperform their Llama 2 counterparts across all categories.
The most significant improvements are observed in the general performance category, particularly in the BIG-Bench Hard and ARC-Challenge benchmarks.
Llama 3 70B shows exceptional performance, especially in knowledge reasoning and reading comprehension tasks, indicating a substantial advancement over the previous generation.

Category	Benchmark	Llama 3 8B	Llama2 7B	Llama2 13B	Llama 3 70B	Llama2 70B
General	MMLU (5-shot)	66.60	45.70	53.80	79.50	69.70
General	AGIEval English (3-5 shot)	45.90	28.80	38.70	63.00	54.80
General	CommonSenseQA (7-shot)	72.60	57.60	67.60	83.80	78.70
General	Winogrande (5-shot)	76.10	73.30	75.40	83.10	81.80
General	BIG-Bench Hard (3-shot, CoT)	61.10	38.10	47.00	81.30	65.70
General	ARC-Challenge (25-shot)	78.60	53.70	67.60	93.00	85.30
Knowledge reasoning	TriviaQA-Wiki (5-shot)	78.50	72.10	79.60	89.70	87.50
Reading comprehension	SQuAD (1-shot)	76.40	72.20	72.10	85.60	82.60
Reading comprehension	QuAC (1-shot, F1)	44.40	39.60	44.90	51.10	49.40
Reading comprehension	BoolQ (0-shot)	75.70	65.50	66.90	79.00	73.10
Reading comprehension	DROP (3-shot, F1)	58.40	37.90	49.80	79.70	70.20

Team

The team behind the Meta Llama 3 Large Language Model (LLM) is a diverse and extensive group of professionals from Meta AI. This team includes experts in various fields such as AI research, software engineering, data science, and cybersecurity, among others. They have collaborated to develop and release the Llama 3 family of models, which are advanced generative text models available in 8B and 70B sizes. These models are designed for dialogue use cases and are optimized for helpfulness and safety. The team’s efforts have been focused on responsible AI development, ensuring that the models are not only powerful but also align with ethical standards and safety best practices. The contributors’ list showcases a wide range of talents and expertise, reflecting the collaborative nature of this project. Their work represents a significant step forward in the field of AI and natural language processing, aiming to provide a robust and reliable tool for both commercial and research applications.

llama.meta.com

Community

ML Commons Adoption: ML Commons has integrated Meta Llama 2 into its MLPerf Inference benchmark, showcasing the model’s performance on different platforms.
UC Berkeley’s RAFT: Researchers at UC Berkeley have developed RAFT, a method that enhances domain adaptation in language models using Meta Llama 2, and tested it on Azure AI Studio.
FoondaMate Study Assistant: Over 3 million students use FoondaMate, an AI study assistant built on Meta Llama 2, to receive instant academic support via WhatsApp.
Odia Generative AI: The OdiaGenAI project aims to extend LLM capabilities to the Odia language, with the development of Odia Llama, a fine-tuned version of Llama 2.
Taiwan LLM: A model tailored for Traditional Chinese language processing, incorporating cultural context and advanced pre-training, has been developed using Meta Llama 2.
Elyza’s Japanese LLM: Elyza has utilized Llama 2 to create a Japanese language model for advanced NLP applications.
Mendel’s Hypercube Platform: Mendel has integrated a Llama 2-based LLM into its Hypercube platform for natural language processing in healthcare.

Active Members: 10,001-50,000 Members

Engagement Level: High Engagement

Resources

List of resources related to this product.

← Phi 3 mini 128k Yi-1.5-34B →