Hacker News new | ask | show | jobs
by yuzhichang 711 days ago
Many vector database vendors claim sparse vector is enough for precise retrieval, bm25 is not necessary.
1 comments

Hi, I'm one of the creators of infinity, and the article has mentioned about the sparse vector vs bm25. While the sparse vector performs well under some evaluations, it is obtained by training a model, which means that it can't fully represent all of the user's keywords/tokens, and those that don't appear in the training set, are truncated. So this is a very big impact for many enterprise vertical scenarios. And bm25 doesn't have such a limitation
BM25 is indeed way more important than these vector DBs will claim. At ParadeDB, we've observed significant use cases where customers need both