Llama 3.2 Lightweight Models for Mobile

by | Sep 29, 2024

Llama 3.2 lightweight models offer efficient AI performance tailored for mobile and edge devices. These models perform text summarization, translation, and tool usage with minimal latency.

Llama 3.2 introduces lightweight models optimized for mobile and edge devices, including the 1B and 3B parameter versions. These distilled models maintain high performance while minimizing computational resources. With a 128K token context length, they excel in text-based tasks such as summarization and translation, and they also handle instructions effectively. These models are particularly suited for applications requiring low latency, including augmented reality, healthcare diagnostics, and environmental monitoring.

Current
Open source
Pretrained, Distilled

Comparison 

Sourced on: September 29, 2024

Llama 3.2 lightweight models, including 1B and 3B, are optimized for mobile and edge AI tasks. They perform efficiently in multilingual translation, text summarization, and low-latency applications such as augmented reality and healthcare. Compared to larger models like GPT-4 mini, these models maintain strong performance in text-based and vision-related tasks, while operating with fewer resources.

TypeBenchmarkLlama 3.2 1BLlama 3.2 3BGemma 2 2B ITPhi-3.5-mini IT
GeneralMMLU (5-shot)49.363.457.869
GeneralOpen-rewrite eval (0-shot, regular)41.640.131.234.5
GeneralTLDR9+ (best, 5-shot, regular)16.81913.912.8
GeneralIFEval59.577.461.959.2
Tool useBFCL V225.76727.458.4
Tool useNexus13.534.32126.1
MathGSM8K (8-shot, Cot)44.477.762.586.2
MathMATH (5-shot, Cot)30.64823.844.2
ReasoningARC Challenge (0-shot)59.478.676.787.4
ReasoningGPQA (2-shot)27.232.827.531.9
ReasoningHellaswag (3-shot)41.269.861.181.4
Long ContextInfiniteBench/En.MC (128B)3863.339.2
Long ContextInfiniteBench/En.QA (128B)20.319.811.3
Long ContextNIH/Multi-needle7584.752.7
MultilingualMGSM (8-shot, Cot)24.558.240.249.8

Team 

The Llama 3.2 project involved a collaboration between Meta’s AI research team and Qualcomm, focusing on optimizing AI for mobile and edge platforms. The team consists of a mix of AI engineers and researchers specializing in model distillation and edge computing solutions. The efforts align with Meta’s broader goal of making AI more accessible and efficient for everyday devices.

Community 

The Llama 3.2 community is active, particularly on Hugging Face, where developers and researchers collaborate on fine-tuning and deploying the models for mobile applications. The community focuses on optimizing these models for low-latency environments, such as mobile devices.

Active Members: 1001-5000 Members
Engagement Level: Medium Engagement

Resources

List of resources related to this product.