|
|
|
|
|
by yagizdegirmenci
485 days ago
|
|
Google introduced this idea in 2022 with "FNet: Mixing Tokens with Fourier Transforms" [0]. Later they found out that, performance of their TPU(s) for matrix multiplication was faster than FFT in the most scenarios. [0]: https://arxiv.org/abs/2105.03824 |
|
"Overall, while approaches such as FNet, Performer, and sparse transformers demonstrate that either fixed or approximate token mixing can reduce computational overhead, our adaptive spectral filtering strategy uniquely merges the efficiency of the FFT with a learnable, input-dependent spectral filter. This provides a compelling combination of scalability and adaptability, which is crucial for complex sequence modeling tasks."
And a comparison section after that.