Hacker News new | ask | show | jobs
by coolness 1985 days ago
This, not to mention one could get the GPU usage on the V100 way higher by training with larger batch sizes, which would also make training much faster.