| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dnautics 2782 days ago
	That's for the actual number crunching but the real power cost is often in bandwidth (as discussed earlier in the op). If you can reliably use lower precision stuff for training, you get 4x the flops for a halving of the bandwidth costs due to matrix mult being O(n^2)