|
|
|
|
|
by gk1
1579 days ago
|
|
That was me. In all honesty if you are already using ES and you just want nearest-neighbor search for less than 10M documents, just stay with ES. Things get less obvious when you grow past 10M documents and still want low latency. Or if you need live index updates without downtime, or if you want to apply metadata filters to nearest-neighbor searches. If you have 100M documents -- not a difficult threshold if you're an enterprise software company or a popular consumer app -- then ES gets ruled out fairly early in the process. We get a lot of those exasperated teams coming to Pinecone after trying their best with ES/OpenSearch. |
|
Why does 100M vectors not work in ES?
- Is this a configuration issue -- common for ES users -- or something fundamental?
- It sounds like latency is the main thing. Any numbers intuition here, and any other dimensions of concern?
AFAICT ES is using the same OSS vector libraries as pinecone, weaviate, etc. ES in general is used for > 100M documents, e.g., logging, so this is surprising.
We are seeing growing interest by our ES/Splunk users in combining our viz tech with vector indexes, so I've been wondering about these, thanks! We currently go out-of-band at the compute tier or dump in our own indexes, but are thinking through managed flows, where fundamental limits gets interesting.