Hacker News new | ask | show | jobs
by msumpter 2979 days ago
I've used similar phonetic algos in Excel to deduplicate CRM data during corporate acquisitions, it always seemed like the source data was hopelessly duplicated, but running a few of theses algos against the data, and then providing the 'best guesses' to the sales team to then do the final massaging of which accounts are truly duplicate or should be left alone.

Soundex is very simple but works well, calculating a strings Jaro–Winkler distance also helped.