Hacker News new | ask | show | jobs
by ne0phyte 3985 days ago
You could implement something like the Levenshtein distance [1] which is easy and still very fast. It gives you the difference between two strings in steps it takes to transform one into the other based on operations like insertion, deletion, substitution (and swapping two adjacent chars).

[1] https://en.wikipedia.org/wiki/Levenshtein_distance

2 comments

Another for your consideration is Jaro-Winkler [1], which I've used with a fair amount of success in previous projects, and is still fairly performant.

[1] https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance

This is more or less the path we take for event matching at SeatGeek[1] using our fuzzywuzzy[2] library. There are places where this falls flat since it's kind of naive, though at that point you can stop thinking about adjacent characters and start thinking about adjacent words.

[1] http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matchi... [2] https://github.com/seatgeek/fuzzywuzzy