Hacker News new | ask | show | jobs
by joshz 4044 days ago
Check out Open Refine. Has a feature that clusters similar strings and unifies. I remember last time I looked at this data set... 4 letter acronyms spelled 12 different ways, it's unbelievably messy.