|
|
|
|
|
by oofbey
159 days ago
|
|
Depending on how different the attention mechanism is, that might not work. If it’s just a faster / different way of finding the tokens to attend to, sure. But I get the sense the author is implying this method uses different semantics somehow. Although tbh I didn’t follow it entry. |
|