Hacker News new | ask | show | jobs
by stevesimmons 1477 days ago
A good starting point for Entity Resolution/Deduplication is the Python Dedupe project [1, 2] and the PhD thesis on whose work it is based [3]

[1] https://github.com/dedupeio/dedupe

[2] https://dedupe.io/

[3] http://www.cs.utexas.edu/~ml/papers/marlin-dissertation-06.p...