|
|
|
|
|
by TheDudeMan
486 days ago
|
|
Referenced in this paper: "Overall, while approaches such as FNet, Performer, and sparse transformers demonstrate that either fixed or approximate token mixing can reduce computational overhead, our adaptive spectral filtering strategy uniquely merges the efficiency of the FFT with a learnable, input-dependent spectral filter. This provides a compelling combination of scalability and adaptability, which is crucial for complex
sequence modeling tasks." And a comparison section after that. |
|
Pretty lame.