In this video, Algorithmic Simplicity explores Mamba, a new neural network architecture that outperforms Transformers in language modeling. Mamba offers significant improvements by using less compute, specifically nlogn compared to the n² compute used by Transformers. The video explains Mamba from the perspective of linear recurrent neural networks (RNNs), avoiding the complexity of state space models. Linear RNNs address the issues of parallelization and training difficulties inherent in traditional RNNs. By replacing the neural net with a linear function and using parallel algorithms, linear RNNs can achieve efficient computation. The video also discusses the problem of vanishing and exploding gradients and how stable initialization can mitigate these issues. Mamba builds on linear RNNs by using different weights for each input, allowing selective forgetting and memory retention. Additionally, Mamba increases the size of output vectors, leveraging high-performance memory to maintain efficiency. The video concludes with a discussion of the controversy surrounding the rejection of the Mamba paper by ICLR 2024, highlighting criticisms and the broader implications for peer review in the machine learning community. Despite the rejection, Mamba’s performance and potential mark a significant development in AI since the introduction of Transformers.

Algorithmic Simplicity
Not Applicable
June 12, 2024
Mamba paper