Hacker News new | ask | show | jobs
by a-dub 1358 days ago
10? no, it's more like 20+. lsh was a core piece of the google crawler. it was used for high performance fuzzy deduplication.

see ullman's text: mining massive datasets. it's free on the web.

1 comments

I think LSH was only introduced in 99 by Indyk et. al. I would say it was a pretty active research area 10 years ago.
right, but massive scale production use in the google crawler to index the entire internet when that was at the bleeding edge was state of the art before the art was even really recognized as an art.

i don't even think they called it ANN. it was high performance, scalable deduplication. (which is, in fact, just fast/scalable lossy clustering)

collaborative filtering was kind of a cute joke at the time. meanwhile they had lsh, in production, actually deduplicating the internet.