Hacker News new | ask | show | jobs
by ivalm 2170 days ago
But batch size is prob least problem since you can do data parallelism (send half batch to each gpu, combine on cpu).

I think only model bigger than gpu mem is where you really wish for nvlink on v100s.