| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alanaan 942 days ago
	great post. could you apply this same framework to optimize training as well?

1 comments

varunshenoy 941 days ago

Slightly different set of trade-offs, but similar mental model. You always use large batch sizes (compute bound) and the bottleneck usually ends up communication between GPUs/nodes.

link