Hacker News new | ask | show | jobs
by sdenton4 2175 days ago
I would guess that this means DawnBench is basically working. You'll get some "overfit" training schedule optimizations, but hopefully amongst those you'll end up with some improvements you can take to other models.

We also seem to be moving more towards a world where big problem-specific models are shared (BERT, GPT), so that the base time to train doesn't matter much unless you're doing model architecture research. For most end-use cases in language and perception, you'll end up picking up a 99%-trained model, and fine tuning on your particular version of the problem.