|
|
|
|
|
by solidasparagus
2172 days ago
|
|
Resnet-50 with DawnBench settings is a very poor choice for illustrating this trend. The main technique driving this reduction in cost-to-train has been finding arcane, fast training schedules. This sounds good until you realize its a type of sleight of hand where finding that schedule takes tens of thousands of dollars (usually more) that isn't counted in cost-to-train, but is a real-world cost you would experience if you want to train models. However, I think the overall trend this article talks about is accurate. There has been an increased focus on cost-to-train and you can see that with models like EfficientNet where NAS is used to optimize both accuracy and model size jointly. |
|
We also seem to be moving more towards a world where big problem-specific models are shared (BERT, GPT), so that the base time to train doesn't matter much unless you're doing model architecture research. For most end-use cases in language and perception, you'll end up picking up a 99%-trained model, and fine tuning on your particular version of the problem.