|
|
|
|
|
by light_hue_1
485 days ago
|
|
Except that the paper is written as if they discovered that you can use an fft for attention. They even have a "proof". It's in the title. Then you discover everyone already knew this and all they do is as some extra learnable parameters. Pretty lame. |
|