|
|
|
|
|
by sdrg822
940 days ago
|
|
Cool. Important note: """
One may ask whether the conditionality introduced by the
use of CMM does not make FFFs incompatible with the
processes and hardware already in place for dense matrix
multiplication and deep learning more broadly. In short, the
answer is “No, it does not, save for some increased caching
complexity."
""" It's hard to beat the hardware lottery! |
|
> We therefore leave the attention layers untouched
Meaning, presumably, that the GPU memory remains the bottleneck
Flops really are quite cheap by now, e.g. vision inference chip ~$2/teraflop/s !!