Y
Hacker News
new
|
ask
|
show
|
jobs
by
alanaan
942 days ago
great post. could you apply this same framework to optimize training as well?
1 comments
varunshenoy
941 days ago
Slightly different set of trade-offs, but similar mental model. You always use large batch sizes (compute bound) and the bottleneck usually ends up communication between GPUs/nodes.
link