|
|
|
|
|
by keithyjohnson
2382 days ago
|
|
I get that these bullets points are answering What instead of Why but for those that are more readily discernible, like "In a year-and-a-half, the time required to train a large image classification system on cloud infrastructure has fallen from about three hours in October 2017 to about 88 seconds", what's causing this? Are models getting smaller without a loss in accuracy? Is training distributed over a greater amount of cheaper machines? Personally, I'd be more excited about the former rather than the latter. We can't all afford MegatronLM-type experiments - https://nv-adlr.github.io/MegatronLM. |
|
At the same time though, consumer GPUs have gotten significantly faster (compare e.g. an Nvidia 2080TI to a 980TI), and learning algorithms keep improving / better learning algorithms become more widely used (e.g. Adam instead of stochastic gradient descent).