← Fine Tuning Qwen 2 with Custom Data Perplexity AI vs You.com Comparison →

Shaping the Future of AI with Transformer History

by Fede Nolasco | Jul 20, 2024

In this lecture from Stanford’s CS25 course, Hyung Won Chung, a research scientist at OpenAI, discusses the history and future of Transformer architectures, focusing on the driving forces behind AI advancements. The talk begins with an introduction to the rapid pace of AI development and the importance of studying changes in AI to predict future trends. Chung emphasizes the dominant driving force behind AI advancements: exponentially cheaper compute and associated scaling. He argues that understanding this driving force can help predict AI’s future trajectory. Chung then delves into the early history of Transformer architectures, explaining the differences between encoder-decoder, encoder-only, and decoder-only models. He highlights how the encoder-decoder architecture, which was initially popular for tasks like machine translation, incorporates more structure compared to the simpler decoder-only models. Chung explains that as compute becomes cheaper and more abundant, simpler models with less structure tend to perform better at scale. He provides examples from his work at Google and OpenAI, illustrating how additional structures in models can become bottlenecks with increased scaling. Chung concludes by encouraging researchers to focus on scalable methods and to be mindful of the assumptions and structures they incorporate into their models. He emphasizes the need for a unified perspective to understand current AI developments and shape the future of AI effectively.

 Stanford Online

 Not Applicable

 June 15, 2024

← Fine Tuning Qwen 2 with Custom Data Perplexity AI vs You.com Comparison →