|
|
|
|
|
by unraveller
485 days ago
|
|
GPU saw a 10% improvement over the TPU >The TPU is so inefficient at FTs that the researchers did not use the FFT algorithm on sequences < 4096 elements, instead opting for a quadratic-scaling FT implementation using a pre-computed DFT matrix. > on an Nvidia Quadro P6000 GPU, the FT was responsible for up to 30% of the inference time on the FNet architecture [0] This company [0] claimed in 2021 they could squash inference time by 40% if google would use their light chips on TPU. Perhaps more if FFTNet does more heavy lifting. [0]: https://scribe.rip/optalysys/attention-fourier-transforms-a-... |
|