In this lecture from Stanford’s CS25 course, Hyung Won Chung, a research scientist at OpenAI, discusses the history and future of Transformer architectures, focusing on the driving forces behind AI advancements. The talk begins with an introduction to the rapid pace of AI development and the importance of studying changes in AI to predict future trends. Chung emphasizes the dominant driving force behind AI advancements: exponentially cheaper compute and associated scaling. He argues that understanding this driving force can help predict AI’s future trajectory. Chung then delves into the early history of Transformer architectures, explaining the differences between encoder-decoder, encoder-only, and decoder-only models. He highlights how the encoder-decoder architecture, which was initially popular for tasks like machine translation, incorporates more structure compared to the simpler decoder-only models. Chung explains that as compute becomes cheaper and more abundant, simpler models with less structure tend to perform better at scale. He provides examples from his work at Google and OpenAI, illustrating how additional structures in models can become bottlenecks with increased scaling. Chung concludes by encouraging researchers to focus on scalable methods and to be mindful of the assumptions and structures they incorporate into their models. He emphasizes the need for a unified perspective to understand current AI developments and shape the future of AI effectively.

Stanford Online
Not Applicable
June 15, 2024