|
|
|
|
|
by yangjunpro
1750 days ago
|
|
An interesting work, with some to-be-addressed questions:
1.The paper only covers the GEMM part with small-scale experiments(CIFAR-10/100), not covering convolution, not covering GEMM part in more popular network such as Transformer/BERT, etc.
2. It is still an approximating method, meaning potential accuracy loss. So I think this method is less attractive to training acceleration scenario, maybe potentially as a complementing methods for inference acceleration.
3. No results evaluated in GPU with TensorCore equipment. I am a little bit curious, since modern AI accelerator(including NV GPU) all incorporate TensorCore which by-design supports GEMM acceleration, what is the add-on value brought by the approximating method mentioned in this paper. |
|
I'm not optimistic about beating tensor cores when running on GPUs, at least until/unless we get similar hardware support.*
Barring better hardware support, the killer app is probably CPU inference--once there are Conv implementations and the necessary GPU kernels to train the network.
*Aside: this support would be pretty doable since the kernels look almost identical to GEMM kernels--you just need a multiplex-add rather than a multiply-add. On an x86 machine, all it would take is a vpshufb-add and a 4-bit unpack instruction.