Mamba is a novel neural network architecture designed to address the computational inefficiencies of Transformers, especially for processing long sequences. Unlike traditional models that rely heavily on attention mechanisms, Mamba eschews attention and MLP blocks in favor of structured state space models (SSMs) whose parameters are dynamic and dependent on the input. This design allows Mamba to more effectively manage information over long sequences, enabling selective information propagation or omission based on the context of current tokens. A key innovation in Mamba is the development of a hardware-aware parallel algorithm that operates in a recurrent mode, allowing for significant improvements in processing speed and efficiency. Mamba distinguishes itself by offering a 5x increase in inference speed compared to Transformers, achieving linear scaling with sequence length, and delivering state-of-the-art performance across a variety of modalities, including language, audio, and genomics. Its ability to outperform Transformers of the same size and match those twice its size in both pretraining and downstream tasks highlights its efficiency and versatility as a sequence modeling framework.