|
|
|
|
|
by algo_trader
940 days ago
|
|
Infact, as stated in the paper, this is bad news > We therefore leave the attention layers untouched Meaning, presumably, that the GPU memory remains the bottleneck Flops really are quite cheap by now, e.g. vision inference chip ~$2/teraflop/s !! |
|