| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by keithyjohnson 2382 days ago
	I get that these bullets points are answering What instead of Why but for those that are more readily discernible, like "In a year-and-a-half, the time required to train a large image classification system on cloud infrastructure has fallen from about three hours in October 2017 to about 88 seconds", what's causing this? Are models getting smaller without a loss in accuracy? Is training distributed over a greater amount of cheaper machines? Personally, I'd be more excited about the former rather than the latter. We can't all afford MegatronLM-type experiments - https://nv-adlr.github.io/MegatronLM.

3 comments

ummonk 2382 days ago

Both. Companies are certainly building bigger and bigger clusters for training.

At the same time though, consumer GPUs have gotten significantly faster (compare e.g. an Nvidia 2080TI to a 980TI), and learning algorithms keep improving / better learning algorithms become more widely used (e.g. Adam instead of stochastic gradient descent).

link

antpls 2381 days ago

And also, architectural search allowed for neural networks to use more efficient builtin blocks, using many less parameters, and achieving the same accuracy with smaller models (and lowering training cost)

link

cyorir 2381 days ago

The improvements in the report are mainly from improvements in cloud infrastructure, but that's not to say there haven't been improvements in developing small, efficient models as well. One notable model that was introduced in 2017 was MobileNet, which aimed to create a model that could function on a mobile device without much loss in accuracy. There have been many more attempts to shrink models for use on devices with limited resources since 2017. These smaller models tend to have lower training times as well.

link

hooande 2381 days ago

read the actual report instead of just the bullet points. the speed improvement is a function of cost on cloud hardware

link