|
|
|
|
|
by mediaman
739 days ago
|
|
They don't mention BM25, which still outperforms much of semantic search. A fun exercise is to watch the benchmarks of the latest semantic embeddings models and see that they still struggle to match good 'ol BM25. BM25 uses the relative statistical frequency of words to identify relevant material, along with some adjustments. It doesn't use ML at all, but it works very well, especially for technical content. SPLADE is capable for some areas but is slow, and often times it doesn't present much of a benefit (or is worse) versus BM25 for technical searches, where specific technical words don't have many synonyms that it would be able to pull. The best search systems today use a mix of semantic search and BM25 or SPLADE, depending on the type of material and the speed required. |
|