Hacker News new | ask | show | jobs
by bluecoconut 1136 days ago
Not production, but yes to scale: I pushed milvus to ~140 million vectors (768 dimension) (though only a handful of requests per second (~10)), and it faired alright once everything was up and running and relatively static on the document side. Rebuilding indexes and stability were a bit of a hassle at times (I was live adding more documents to it ~1 million per 30 minutes) and it would occasionally fall over and need to rebuild, subsequently causing a lot more load, rejecting new documents, etc.). Probably lots of tuning I probably could have done to eek out more performance and stability though. Ended up being hours of effort on the rebuilds and lots of careful management of RAM (on a 300 GB RAM machine)

for the scale you are saying "larger scale": At the few million documents scale I would just suggest using just any libary, eg. `hnsw` in `nmslib` or `faiss`.

I just did some benchmarks with 1M docs, `cosinesimil_sparse` on `78628` dimensional binary vectors (nmslib `hnsw`) -> 30 seconds to build the index, and can process a batch of 100 document query in 3ms (Each with 100 KNN). Based on this question, i just put a loop over it and it handled 1000 random queries (non batched) in 1.11 seconds. (~1 GB peak RAM usage, and using 24 threads)

All in all, my personal opinion is: even up to few "millions" scale, i'm finding using the underlying libraries (`faiss` and `nmslib`) significantly easier than using the wrapper tools / databases (milvus and pinecone). I don't really get the point of a separate piece of infra for something that is essentially ~15 lines of python at most scales that matter (~few millions). (Note, in the ~10k-100k scale or less, simple numpy and sort seems to be fast enough (and exact) or just exact NN w/ sklearn.neighbors)... And when you push to scales that it does start breaking (100 million+), then the database versions seem to break as well (and require fiddling with lots of bespoke config)

1 comments

Thanks for the input! I asked about the scale of items and traffic, because my use case actually requires separate piece of infrastructure. It's around 100 millions of items and live production traffic from millions of users with high latency demand. So it's not a batch job that can be performed in memory, as I understand your case.

Currently I use Elasticsearch with the Open Distro approximate kNN plugin by the way.

We at Pinecone have lots of customers at those operating levels (and many that are even higher)... If a managed option is viable for you.
Why are you not using the latest OpenSearch instead of Elastic with the older Open Distro kNN plugin?
Maybe I was not precise, I in fact use OpenSearch, but since it's a fork to ES, I consider this to be the same DB, architecture-wise.