Llama 3.2 introduces lightweight models optimized for mobile and edge devices, including the 1B and 3B parameter versions. These distilled models maintain high performance while minimizing computational resources. With a 128K token context length, they excel in text-based tasks such as summarization and translation, and they also handle instructions effectively. These models are particularly suited for applications requiring low latency, including augmented reality, healthcare diagnostics, and environmental monitoring.
Llama 3.2 lightweight models, including 1B and 3B, are optimized for mobile and edge AI tasks. They perform efficiently in multilingual translation, text summarization, and low-latency applications such as augmented reality and healthcare. Compared to larger models like GPT-4 mini, these models maintain strong performance in text-based and vision-related tasks, while operating with fewer resources.
Type | Benchmark | Llama 3.2 1B | Llama 3.2 3B | Gemma 2 2B IT | Phi-3.5-mini IT |
---|---|---|---|---|---|
General | MMLU (5-shot) | 49.3 | 63.4 | 57.8 | 69 |
General | Open-rewrite eval (0-shot, regular) | 41.6 | 40.1 | 31.2 | 34.5 |
General | TLDR9+ (best, 5-shot, regular) | 16.8 | 19 | 13.9 | 12.8 |
General | IFEval | 59.5 | 77.4 | 61.9 | 59.2 |
Tool use | BFCL V2 | 25.7 | 67 | 27.4 | 58.4 |
Tool use | Nexus | 13.5 | 34.3 | 21 | 26.1 |
Math | GSM8K (8-shot, Cot) | 44.4 | 77.7 | 62.5 | 86.2 |
Math | MATH (5-shot, Cot) | 30.6 | 48 | 23.8 | 44.2 |
Reasoning | ARC Challenge (0-shot) | 59.4 | 78.6 | 76.7 | 87.4 |
Reasoning | GPQA (2-shot) | 27.2 | 32.8 | 27.5 | 31.9 |
Reasoning | Hellaswag (3-shot) | 41.2 | 69.8 | 61.1 | 81.4 |
Long Context | InfiniteBench/En.MC (128B) | 38 | 63.3 | 39.2 | |
Long Context | InfiniteBench/En.QA (128B) | 20.3 | 19.8 | 11.3 | |
Long Context | NIH/Multi-needle | 75 | 84.7 | 52.7 | |
Multilingual | MGSM (8-shot, Cot) | 24.5 | 58.2 | 40.2 | 49.8 |
The Llama 3.2 project involved a collaboration between Meta’s AI research team and Qualcomm, focusing on optimizing AI for mobile and edge platforms. The team consists of a mix of AI engineers and researchers specializing in model distillation and edge computing solutions. The efforts align with Meta’s broader goal of making AI more accessible and efficient for everyday devices.
The Llama 3.2 community is active, particularly on Hugging Face, where developers and researchers collaborate on fine-tuning and deploying the models for mobile applications. The community focuses on optimizing these models for low-latency environments, such as mobile devices.