| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ashvardanian 711 days ago
	Hashing or tiny neural nets combined with a Vector Search engine with Tanimoto/Jaccard is a very common deduplication strategy for large datasets. It might be wiser than using linear-complexity MapReduce operations. There is a nice Google project using 0.5 M parameter RETSim model and the USearch engine for that: https://github.com/google/unisim