Hacker News new | ask | show | jobs
by CornCobs 1838 days ago
Interesting! What I noticed when approaching the problem was that there is quite little information on scaling up. I also don't think there are good out-of-the-box solutions covering a wide range of use cases. Dedup (basically cross-product) and linkage (highly dependent on the relative sizes of your search set and backing data) have very different optimizations when your data is large