Hacker News new | ask | show | jobs
by marginalia_nu 937 days ago
If you mean in the sense of dealing with documents that are very similar but not binary identical, a locality sensitive hash would do the job.