Hacker News new | ask | show | jobs
by kadushka 485 days ago
* If you remove the softmax from the attention formula, you end up with linear attention*

Sorry, what?

1 comments

Here is a paper explaining it: https://arxiv.org/abs/2006.16236