Hacker News new | ask | show | jobs
by keithyjohnson 2382 days ago
I get that these bullets points are answering What instead of Why but for those that are more readily discernible, like "In a year-and-a-half, the time required to train a large image classification system on cloud infrastructure has fallen from about three hours in October 2017 to about 88 seconds", what's causing this? Are models getting smaller without a loss in accuracy? Is training distributed over a greater amount of cheaper machines? Personally, I'd be more excited about the former rather than the latter. We can't all afford MegatronLM-type experiments - https://nv-adlr.github.io/MegatronLM.
3 comments

Both. Companies are certainly building bigger and bigger clusters for training.

At the same time though, consumer GPUs have gotten significantly faster (compare e.g. an Nvidia 2080TI to a 980TI), and learning algorithms keep improving / better learning algorithms become more widely used (e.g. Adam instead of stochastic gradient descent).

And also, architectural search allowed for neural networks to use more efficient builtin blocks, using many less parameters, and achieving the same accuracy with smaller models (and lowering training cost)
The improvements in the report are mainly from improvements in cloud infrastructure, but that's not to say there haven't been improvements in developing small, efficient models as well. One notable model that was introduced in 2017 was MobileNet, which aimed to create a model that could function on a mobile device without much loss in accuracy. There have been many more attempts to shrink models for use on devices with limited resources since 2017. These smaller models tend to have lower training times as well.
read the actual report instead of just the bullet points. the speed improvement is a function of cost on cloud hardware