|
|
|
|
|
by dnautics
2782 days ago
|
|
That's for the actual number crunching but the real power cost is often in bandwidth (as discussed earlier in the op). If you can reliably use lower precision stuff for training, you get 4x the flops for a halving of the bandwidth costs due to matrix mult being O(n^2) |
|