Sliding Window Attention

A technique used in transformer models to limit the attention span of each token to a fixed size window around it, reducing computational complexity and making the model more efficient.

Sliding Window Attention

Areas of application

  • Natural Language Processing
  • Speech Recognition
  • Image Processing
  • Time Series Analysis
  • Neural Machine Translation
  • Autonomous Vehicles
  • Healthcare Analytics

Example

For example, in a machine translation task, SWA can be used to focus only on the tokens within a certain distance from the current word being translated, rather than considering the entire input sequence.