|
|
|
|
|
by littlestymaar
840 days ago
|
|
Given that GP explicitly said “if you don't have attention”, and we're in a thread about a language model whose main characteristics is not to use attention, I don't understand why you insist in talking about attention … |
|
My response was trying to clarify some confusion.
I am all for alternatives to attention. I don’t think BM25 cuts it. I don’t think anything that samples tokens based on BM25 weights (the idea in this subthread) would cut it.