|
|
|
|
|
by hintymad
1771 days ago
|
|
> Apart from search, ANNs can be use for recommendations, classification, and other information retrieval problems. Yeah, those are all good use cases. I was wondering about a different thing: will the demand for a distributed vector search service concentrate to a few big companies, as smaller companies can use a simpler solution so they don't really need to pay for the technology. > Currently, ES and Solr, both based on Lucene, can't really manage vector representations, as they are mainly based on inverted indexes to n-grams. ES has kNN plugin, which stores vectors separately in each segment in Lucene index. Plus, they can also use better storage formats and algorithms. |
|
I guess it depends on what you mean by "simple". The algorithms are complex but there are good tools that implement them. I would imagine smaller companies would use off the shelf tooling, and I would argue that is simpler. Vector embeddings are so unbelievably powerful and often yield better results than classical methods with one of the good tools + pretrained embeddings.
Specifically for search, I use them to completely replace stemming, synonyms, etc in ES. I match the query's embedding to the document embeddings, find the top 1000 or so. Then I ask ES for the BM25 score for that top 1000. I combine the embedding match score with BM25, recency, etc for final rank. The results are so much better than using stemming, etc and it's overall simpler because I can use off the shelf tooling and the data pipeline is simpler.