Python – Edit Distance

The Edit Distance problem, also known as the Levenshtein Distance, measures the minimum number of operations required to convert one string into another. The allowed operations are insertion, deletion, or substitution of a single character.

This tutorial will walk you through a dynamic programming approach to solve the Edit Distance problem in Python, with detailed explanations for each example.

Problem Statement

Given two strings word1 and word2, determine the minimum number of operations required to convert word1 into word2. The permitted operations are:

  1. Insert a character
  2. Delete a character
  3. Replace a character

Sample Input and Output

Example 1:

</>
Copy
# Input:
word1 = "kitten"
word2 = "sitting"

# Output:
3

Explanation: To convert “kitten” into “sitting”, the following operations are performed: 1. Replace ‘k’ with ‘s’ → “sitten” 2. Replace ‘e’ with ‘i’ → “sittin” 3. Insert ‘g’ at the end → “sitting” Thus, the minimum number of operations required is 3.

Example 2:

</>
Copy
# Input:
word1 = "horse"
word2 = "ros"

# Output:
3

Explanation: To convert “horse” into “ros”, the following operations are performed: 1. Delete ‘r’ → “hose” 2. Delete ‘e’ → “hos” 3. Replace ‘h’ with ‘r’ → “ros” Thus, the minimum number of operations required is 3.

Solution Approach

We solve the problem using dynamic programming by constructing a 2D table dp where dp[i][j] represents the minimum edit distance between the first i characters of word1 and the first j characters of word2.

The steps to build the DP table are as follows:

  1. Initialize a table dp of size (m+1) x (n+1), where m and n are the lengths of word1 and word2 respectively.
  2. Set dp[i][0] = i for all i (converting word1 to an empty string requires i deletions) and dp[0][j] = j for all j (converting an empty string to word2 requires j insertions).
  3. For each character in word1 (index i) and word2 (index j):
    • If word1[i-1] == word2[j-1], then no new operation is needed, so dp[i][j] = dp[i-1][j-1].
    • If they differ, choose the minimum cost among:
      • dp[i-1][j] (deletion)dp[i][j-1] (insertion)dp[i-1][j-1] (replacement)