| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ankit219 875 days ago

I think alpha lies in how good the embedding space is rather than which db you use to store and retrieve. A typical tradeoff between accuracy and performance, and here accuracy will be more important in many cases esp for businesses and enterprises. With that, and existing database providers introducing their own support for vectors, this space might be commoditized in near term.

Re embeddings, you would likely get better results if you train your own embeddings model. A popular approach is ColBERT, which anecdotally outperforms vector search in border cases[1]. Second is training an embedding model using initial layers of an LLM. [2]. In Colbert's case once it's trained, you dont need a db to store the vectors.

[1]: https://twitter.com/arjunkmrm/status/1744741903646773674 [2]: https://huggingface.co/intfloat/e5-mistral-7b-instruct

1 comments

infecto 875 days ago

I agree with you. I was ignoring the accuracy/performance tradeoff. Even in that space while there is certainly a lot of innovation left, there is already so much that is available commercially open source. If that holds true, you are really left with competing on price and scale in the long run.

link