Hacker News new | ask | show | jobs
by varunshenoy 942 days ago
Slightly different set of trade-offs, but similar mental model. You always use large batch sizes (compute bound) and the bottleneck usually ends up communication between GPUs/nodes.