| HN Mirror

I think alpha lies in how good the embedding space is rather than which db you use to store and retrieve. A typical tradeoff between accuracy and performance, and here accuracy will be more important in many cases esp for businesses and enterprises. With that, and existing database providers introducing their own support for vectors, this space might be commoditized in near term.

Re embeddings, you would likely get better results if you train your own embeddings model. A popular approach is ColBERT, which anecdotally outperforms vector search in border cases[1]. Second is training an embedding model using initial layers of an LLM. [2]. In Colbert's case once it's trained, you dont need a db to store the vectors.

[1]: https://twitter.com/arjunkmrm/status/1744741903646773674 [2]: https://huggingface.co/intfloat/e5-mistral-7b-instruct