Hacker News new | ask | show | jobs
by ethanahte 1162 days ago
Hi, author here. I totally agree with you that, for large scale, you're going to need a vector database. My hope is more to help people avoid scenarios like the one in this comment: https://news.ycombinator.com/item?id=35552303 Tangentially, I really like the approach that haystack has taken, where they allow you to slot in whichever document store you want, and that document store can scale from in-memory, to sqlite, to postgres, to pinecone https://docs.haystack.deepset.ai/docs/document_store

In terms of the one-time cost of indexing, you're totally right! Although, one thing to call out is that you will have to re-index every time you change your embedding model, such as for fine-tuning. I don't have a good handle on how prevalent this is, though.

1 comments

Thanks for mentioning Haystack :)
It's a nice project that was ahead of its time! I hope it can successfully ride the current hype wave :D