| HN Mirror

> ... if we want it to mimic HN comments, we'd need to spend the same amount of GPU time ...

These models are often much more general than you seem to be thinking. There's a base model which is incredibly computationally expensive to create from scratch. It is trained on a very large, very general set of data. Then there are specialized versions which are much cheaper to create - you start from the base model that you already have, and you train (much more briefly) on a specific set of data in order to tailor the output.

https://www.tensorflow.org/hub/tutorials/image_retraining

> Modern image recognition models have millions of parameters. Training them from scratch requires a lot of labeled training data and a lot of computing power (hundreds of GPU-hours or more). Transfer learning is a technique that shortcuts much of this by taking a piece of a model that has already been trained on a related task and reusing it in a new model.