Tag: Leetcode Hard
Edit Distance
Edit distance between two words is the minimum number of operations , i.e character insertions, deletions or substitutions, to transform one word to another word. This operation has cost associated with it.
Levenshtein Distance
Levenshtein Distance is edit distance with cost = 1
for the operation , it comes from the family of distance metrics.
Family of Distance Metrics
Definition
Levenshtein distance - Wikipedia
Example
- Edit Distance between EXECUTE and EXPIRE is 4.
- EXECUTE → EXPCUTE → EXPIUTE → EXPRTE→ EXPIRE
- Better representation using "gap representation"
E X E C U T E
E X P I R _ E
0 0 1 1 1 1 0
- Sum the Cost = 4
- " empty space
_
" to denote insertion or deletion.
Recurrence Relation
ED(s1,s2,i,j) where i - s1[0...m-1] , j - s2[0...n-1]
Insertion: Recur for i and j+1
Deletion: Recur for i+1 and j
Substitution: Recur for i+1 and j+1
Algorithm
Brute Force Approach
Brute Force approach is to enumerate all possible matches for edit operations until we find match between two strings. While comparing the characters, when the character match we do not have to perform any operation , so the cost becomes 0. If they do not match, then we check for possibilities of each insertion ,deletion and substitution operation, with cost 1.
Code
public int minDistance(String word1, String word2) {
return ED(word1,word2,0,0);
}
private int ED(String s1, String s2, int i, int j){
if(i==s1.length())
return s2.length()-j; // remaining size difference
if(j==s2.length())
return s1.length()-i; // remaining size difference
if(s1.charAt(i) == s2.charAt(j))
return ED(s1,s2,i+1,j+1); //cost = 0
else
return Math.min(Math.min(
ED(s1,s2,i,j+1), //insertion
ED(s1,s2,i+1,j)), //deletion
ED(s1,s2,i+1,j+1)) //substitution
+ 1; //cost = 1 for Levenshtein
}
Visualize Recursion Tree
Notice :
- When there is a Prefix Match between two strings, the tree progresses only in one direction. e.g.
ED(0,0) and ED(1,1)
. - Otherwise three children (one for each operation) are generated. e.g
ED(2,2)
. - Also, notice repeating sub-problems e.g.
ED(3,3)
.
During recursion, we are trying to breakdown our original problem into sub-problems. We keep recursing until we hit empty string. This hints to be a dynamic programming problem.
At a high level, dynamic programming approaches includes below, and we will walk through them in the upcoming blogs for this series.
- Recursion and Memoization
- Tabulation aka Wagner–Fischer Algorithm
Let us first find out the complexity of the code above.
Complexity Analysis
Time complexity:
- at least : O(3 min(m,n) )
- at worst case: O(3n) , occurs when m=n
Space complexity:
- O(n)
Applications of Edit Distance
- diff (Unix)
- stemming (NLP)
- spelling correction
- DNA sequence
UpNext
The complexity for the brute force approach came up to being exponential. How did we come up with the analysis? Stay tune for the next blog to understand the complexity analysis over a recursion tree.
Below is a Map of things we have written so far. Also, we would like to hang with people who are doing these algorithms. Join us at Discord or follow us over Twitter/Instagram.