The Levenshtein Distance

The Levenshtein distance is a string metric for measuring the difference between two sequences. It is calculated as the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

The Levenshtein Distance

Areas of application

  • Computer science – used in text comparison and similarity measurement
  • Data analysis – used in data cleaning and preprocessing
  • Natural language processing – used in language modeling and text classification
  • Information retrieval – used in search engines to measure the similarity between queries and documents
  • Machine learning – used as a feature in some algorithms to capture the similarity between words or phrases

Example

For example, the Levenshtein distance between the words ‘kit’ and ‘sit’ is 1, since a single letter needs to be inserted in order to transform ‘kit’ into ‘sit’.