|
|
|
|
|
by curious_cat_163
845 days ago
|
|
I mean, if we are going to get past attention (very much on board with the idea!), then it might help to know what it is really contributing to a model. My response was trying to clarify some confusion. I am all for alternatives to attention. I don’t think BM25 cuts it. I don’t think anything that samples tokens based on BM25 weights (the idea in this subthread) would cut it. |
|