Hacker News new | ask | show | jobs
by sabalaba 2177 days ago
Nothing really has changed in the last two years in terms of training cost. I think the author is making unreasonable extrapolations based on changes in performance on the Dawn benchmarks. A lot of the results are fast but require a lot more compute / search time to find the best parameters and training regimen that lead to those fast convergence times. (Learning rate schedule, batch size, image size schedules, etc.) The point being that once the juice is squeezed out you aren’t going to continue to see training convergence time improvements on the same hardware.

Also, because you cited our GPU benchmarks, I also wanted to throw in a mention our GPU instances which have some of the lowest training costs on the Stanford Dawn Benchmarks discussed in the article.

https://lambdalabs.com/service/gpu-cloud