Edit Distance hard
Description
Given two strings word1 and word2, return the minimum number of operations required to convert word1 into word2.
You have three operations:
- Insert a character
- Delete a character
- Replace a character
Also known as Levenshtein distance. The bedrock of fuzzy matching, spell check, DNA alignment, and git diff.
Examples
> Case 1:
word1 = "horse", word2 = "ros"
Output: 3
Explanation:
horse -> rorse (replace 'h' with 'r')
rorse -> rose (delete 'r')
rose -> ros (delete 'e')
> Case 2:
word1 = "intention", word2 = "execution"
Output: 5Constraints
0 <= word1.length, word2.length <= 500
State design
- What am I computing?
dp[i][j]= min edits to transformword1[0..i)intoword2[0..j). - Dimensions? Two.
- Base cases?
dp[0][j] = j(turn empty intoword2[0..j)by insertingjchars).dp[i][0] = i(turnword1[0..i)into empty by deletingichars). - Transition?
- Match (
word1[i - 1] == word2[j - 1]): no edit needed.dp[i][j] = dp[i - 1][j - 1]. - Mismatch: try all three operations, take the min, add 1:
- Replace the last char:
dp[i - 1][j - 1] + 1. - Delete the last char of
word1:dp[i - 1][j] + 1. - Insert a char to match
word2[j - 1]:dp[i][j - 1] + 1.
- Replace the last char:
- Match (
- Order? Increasing
i, increasingj. - Answer?
dp[n][m].
The three transitions correspond to the three legal moves on the βalignment gridβ:
- Diagonal = consume one char from each side (match β free; mismatch β +1 replace).
- Down = consume one char from
word1only (= delete it). - Right = consume one char from
word2only (= insert it).
Edit distance is the shortest path through this grid where every step costs at most 1.
Code (2D table)
Space-optimized to O(m)
Analysis
- Time:
O(n Γ m). - Space:
O(n Γ m)for the 2D table,O(m)rolled.
Same skin
- Delete Operation for Two Strings β only delete allowed; equivalently
n + m β 2 Γ LCS. - Minimum ASCII Delete Sum for Two Strings β same recurrence, edit cost = ASCII value of deleted char instead of
1. - One Edit Distance β early exit when count exceeds 1.
- Wildcard / Regex Matching β extend the transition with
*and?(or*and.).
The Levenshtein recurrence is the spine of most string-DP problems β once itβs in your fingers, an entire family of problems takes the same shape.