| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by freeone3000 2012 days ago
	GPU underutilization depends on what, exactly, the model you're training is. It's not unreasonable to hit 80% or more of CUDA core usage on non-recurrent models like convnets, given sufficiently fast data pipelines and a reasonable batch size. Transformers and other recurrent functions hit 100% CUDA core utilization for large portions of each epoch, with the low-% usage on the comparatively short weight update at the end. As well, the current rule of thumb is that at the same price point (so a Xeon 4114 and a Nvidia Titan RTX) the GPU completes each epoch in 10% of the time as the CPU given the same compute graph... So it's highly unlikely that training will be anywhere close to as fast on a CPU as it is on a GPU.