But you’re arguing that such a guarantee ALWAYS is most relevant. When that’s not always the case.
There’s been extensive research justification behind the vector-space model. BM25 is the 25th iteration of a model and well tuned BM25 holds the highest non nueral performance on many tasks including question answering[1]. Research has long found including factors other than total term matches matters. Such as IDF[2] and field length[3].
Have you benchmarked your relevance assumptions similarly? If so I’d love to see them and learn more!
You are making up straw man again. Did I make that claim?
I was simply motivate my work, pointing out it's a problem that the current generation of search engine does not address.
What is "relevance", it is of course a context sensitive question.
You talk as if BM25 is the gold standard when it is not.
The research on this is all over the place. I just read an article that says that BM25 is way worse than alternative language models. You don't have to look far, for example this one that talks about Wand:
You know why I quit academia? It is useless arguments and virtual signalings like these.
I'd rather go out and build a damn thing that people like to use.
I merely pointed out that there's a problem, and I have a solution. I did not claim that my problem and my solution solve all problems. Isn't this obvious?
In my problem, my solution beats Lucene. It's as simple as that.
So if Lucene wants to be this infinitely configurable search library, it would be advisable to offer my solution as an option, or offer an better one that does something similar.
and I am simply saying, arguably, you can't make that claim without evidence.
(That said, I do find much to like about the article and the WAND / T-WAND explanation. I'm only pointing out you can't claim 'better relevance' without a benchmark to go with it.)
I was simply motivate my work, pointing out it's a problem that the current generation of search engine does not address.
What is "relevance", it is of course a context sensitive question.
You talk as if BM25 is the gold standard when it is not.
The research on this is all over the place. I just read an article that says that BM25 is way worse than alternative language models. You don't have to look far, for example this one that talks about Wand:
https://dl.acm.org/doi/10.1145/2537734.2537744
You know why I quit academia? It is useless arguments and virtual signalings like these.
I'd rather go out and build a damn thing that people like to use.
I merely pointed out that there's a problem, and I have a solution. I did not claim that my problem and my solution solve all problems. Isn't this obvious?
In my problem, my solution beats Lucene. It's as simple as that.
So if Lucene wants to be this infinitely configurable search library, it would be advisable to offer my solution as an option, or offer an better one that does something similar.
So far I have not seen any takers, only excuses.