| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tobrien6 1727 days ago

The approximate kNN is quite nice for many use cases, and scales to billions of documents. However, you're correct that filtering happens on the results. This is only an issue in certain use cases where filtering is very narrow, as you can often just request much higher k than the number of results you really need without much slowdown.

If the filtering is very narrow, as you commented they also provide functionality to perform pre-filtering and then exact kNN on the results. This is of course higher latency, but still quite acceptable for many use cases (this is how I use it).

I believe there are use cases that Pinecone addresses better than Opensearch, but I want to let people know that there is a free, open-source solution which _may_ also work for their use case.

Elasticsearch does currently support vector search through script score using dense vector fields, however I suspect they are still working on improving it and I prefer the Opensearch implementation for the time being https://www.elastic.co/guide/en/elasticsearch/reference/curr...