Hacker News new | ask | show | jobs
by throwaway81523 739 days ago
I found BM25 and everything resembling it (like TF/IDF) to be near useless. It was (back in the day) really necessary to use external semantic info, or at least data gathered by examining the whole document set for stuff going beyond term frequency. I was excited by the first part of the SPLADE article because I thought it was going to use LLM's to somehow find concept embeddings in documents and let you search for those. But as someone said, it turns out to be a version of synonym search except the thesaurus is generated automatically. I remember someone did that with Word2Vec some years back and it was sort of useful, but generally the problem with search systems is too many results rather than missing some that are relevant.