Hacker News new | ask | show | jobs
by ghita_ 566 days ago
Very impressive results. I'm curious how you benchmarked against bm25 in terms of accuracy? I couldn't find metrics around that, just one search example. I think there are use cases where latency is king, but when it comes to vector search / hybrid search accuracy is probably more important.
1 comments

For the latency benchmarks we used vanilla BM25 (SimilarityType::Bm25f for a single field) for comparability, so there are no differences in terms of accuracy.

For SimilarityType::Bm25fProximity which takes into account the proximity between query term matches within the document, we have so far only anecdotal evidence that it returns significantly more relevant results for many queries.

Systematic relevancy benchmarks like BeIR, MS MARCO are planned.

got it - i think the anecdotal evidence is what intrigued me a little bit looking forward to seeing the systematic relevancy benchmarks