Hacker News new | ask | show | jobs
by sytelus 977 days ago
Is there any implementation of open source vector db that is fast enough to say create embedding of 100M documents locally within few hours and find ranked matches in under a second? I tried ChromaDb and it is super slow, basically unusable.
4 comments

Vector DBs don't create embeddings; they store them. As the article points out, the LLM's slowness to respond diminishes the performance that vector DB's can potentially add.
You can easily create embeddings locally though, with small (L)LMs. Three lines of code using hugging face. I don't understand the point of this article.
Look it this way: you have a web app that can handle 10 req/s. It does not matter if you add a database behind it that can handle 10,000 req/s. 10 req/s still limit you. You're not gaining any performance benefits. I always wondered why you would need a dedicated vector DB.
vector DBs have much higher 'semantic' recall than classical search engines if you want to ask questions about your documents or previous discussions.
Actually, some Vector DB's can generate embeddings as well as storing them, Chroma in particular uses SentenceTransformers models.
Generating the embeddings is by far the slowest part of that, but it’s embarrassingly parallel so if you have $ it can be done that quick.

When I worked for Dubai airport, I was tasked with building a vector similarity search that was highly optimised for query speed, in the end I ended up holding the vectors in memory (in a numpy array) and using scipy to do cosine similarity, I could get about 1.2 million vectors per second per core after tweaking and optimising, again this is embarrassingly parallel so if you have more vectors, chunk it to fit your hardware and you should get more or less linear scale with that per core.

If you want a hand writing this let me know.

(Also there’s a lot of caveats here, for example they did not need to update the vectors, it was an extremely read-heavy usage pattern)

Actually if you set it up right it doesn't cost you anything more to do it if it's almost fully parallel. It doesn't matter if you paid for one GPU instance for 500 hours or 50 for 10 hours. The cost is about the same. You can also more confidently use spot pricing to reduce the cost.
I do about 100 million embeddings using around 50 GPU instances and feed them into Qdrant. Takes about 12 hours. Very happy with the result and performance as long as you have the option to have a very large memory instance running.
Hello from Qdrant. Would like to hear more about your use case. If not yet connected. https://www.linkedin.com/in/zayarni
Hey Andre. I've actually gotten a lot of great help talking to your co-founder Andrey on your discord channel and he helped me out a lot with making it work at our scale. Super happy with it and it's working in production with no hiccups.
PostgreSQL will match your requirements. Although you won’t load that fast, just because generating the embeddings take longer than that, independently of your storage engine.