| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by esafak 1142 days ago
	HNSW seems better in the criteria that matter. I think similarity search is a commodity now; I would not invest in developing an in-house solution given the abundance of good commercial solutions.

4 comments

bobvanluijt 1142 days ago

That depends a bit on the scale and use case specifics. But commoditized billion-scale vector search is indeed a thing. We published this for Weaviate in December last year https://weaviate.io/blog/sphere-dataset-in-weaviate

link

fzliu 1142 days ago

We've seen Milvus used in a variety of recommender systems running in production.

link

cjbgkagh 1142 days ago

They’re embeddings so they’re dense. There are few things easier than dense vector similarity.

link

esafak 1142 days ago

Embeddings for retrieval don't have to be. It is not unheard of to transform the raw embeddings to optimize them for retrieval; e.g., through binarization or hashing.

link

cjbgkagh 1142 days ago

I was more making a distinction between embeddings and bag of words which are very very sparse matrices. The embedding dimensionality will not be anywhere near as high so this level of sparsity is a minor inconvenience.

Edit: also CPUs for this, yikes…

link

jakearmitage 1142 days ago

Such as...?

link

kartoolOz 1142 days ago

Vespa.ai is pretty crazy too, a bit unkown we run a huge vespa cluster serving 1k+ queries with <100ms latency ...

link

jabo 1141 days ago

Typesense: https://typesense.org/docs/0.24.1/api/vector-search.html

link

esafak 1142 days ago

Qdrant, milvus, weaviate

link