Hacker News new | ask | show | jobs
by solidasparagus 2172 days ago
Resnet-50 with DawnBench settings is a very poor choice for illustrating this trend. The main technique driving this reduction in cost-to-train has been finding arcane, fast training schedules. This sounds good until you realize its a type of sleight of hand where finding that schedule takes tens of thousands of dollars (usually more) that isn't counted in cost-to-train, but is a real-world cost you would experience if you want to train models.

However, I think the overall trend this article talks about is accurate. There has been an increased focus on cost-to-train and you can see that with models like EfficientNet where NAS is used to optimize both accuracy and model size jointly.

1 comments

I would guess that this means DawnBench is basically working. You'll get some "overfit" training schedule optimizations, but hopefully amongst those you'll end up with some improvements you can take to other models.

We also seem to be moving more towards a world where big problem-specific models are shared (BERT, GPT), so that the base time to train doesn't matter much unless you're doing model architecture research. For most end-use cases in language and perception, you'll end up picking up a 99%-trained model, and fine tuning on your particular version of the problem.