Hacker News new | ask | show | jobs
by colincooke 2172 days ago
Yeah you're not wrong, but it's a bit misleading. This allows you to run faster, but it does it by allowing you to use a larger batch size (arguably not best practice but your mileage will vary). Memory pooling is a bit different in that you can treat the combined cards as a single card from TF/pytorch.
1 comments

But batch size is prob least problem since you can do data parallelism (send half batch to each gpu, combine on cpu).

I think only model bigger than gpu mem is where you really wish for nvlink on v100s.