In this video, Ai Flux delves into Apple’s recent WWDC 2024 event, highlighting the key details about Apple’s new Foundation LLM that were not prominently announced. The video explains how Apple plans to run a 3 billion parameter LLM directly on the iPhone 15 Pro, achieving a remarkable 0.6 milliseconds to the first token and 30 transactions per second. This is achieved through on-device processing, grouped-query attention, and specialized LoRA adapters.

The video starts by discussing Apple’s partnership with OpenAI to integrate ChatGPT into iOS 18 and the latest iPhones. However, the focus shifts to Apple’s own bespoke LLMs, which are designed to run efficiently on Apple silicon. The new model is claimed to rival the performance of existing 7 and 8 billion parameter models and supports both text and image recognition.

Ai Flux explains the technical aspects of the model, including its use of grouped-query attention, activation, and embedded quantization running on Apple’s neural engine. The model also employs Dynamic load cache and swap LoRA adapter models to enhance performance. Additionally, Apple has used ablation or rejection sampling for optimization and synthetic data for training, achieving impressive benchmarks.

The video also touches upon Apple’s Ax Learn framework, which is based on JAX and FSP for training on TPUs and GPUs. Despite using TPUs, Apple continues to focus on RHF (Reinforcement Learning from Human Feedback) for post-training.

The benchmarks show that Apple’s 3 billion parameter adapter outperforms Phi-3 on common LLM tasks and achieves comparable performance to GPT-4 Turbo in synthetic benchmarks. The video concludes by discussing the implications of edge compute and how Apple’s advancements could redefine the capabilities of mobile devices.

Overall, the video provides an in-depth look at Apple’s new LLM, its technical innovations, and potential impacts on the AI landscape.

Ai Flux
Not Applicable
June 15, 2024
Apple Research Paper