|
|
|
|
|
by febin
1458 days ago
|
|
Thank you for your efforts. I came across your paper/code and posted it here. I was looking to find a technique to cost optimise transformer based question and answering. Presently I am using CPU and getting a GPU is too costly on AWS. Since I use high level code I don't understand the maths completely. However, I was wondering if your techniques can be beneficial on CPUs? If I were to use this to improve transformer based architecture what should be my approach? |
|
It should be possible to get large speedups on CPUs, but the trick will be gradually approximating each of the layers in the model (see my reply to sibling comment). It's not conceptually difficult, but will require a fair amount of C++ work to port the code to GPUs* for training; and it will probably go slower than dense ops on modern GPUs due to tensor cores not supporting our memory layout.
I think of this paper as the first in a two-part series, where the next one takes these fast ops and gets them working in full neural nets. (If anyone wants to do this project, happy to coadvise you / talk about it whenever; I won't have bandwidth to do it myself for the foreseeable future).
*Someone recently started doing this as part of their master's thesis: https://github.com/joennlae/halutmatmul