Hacker News new | ask | show | jobs
by solidasparagus 2569 days ago
Retrained from scratch? Why couldn't you just fine-tune the base model with Trumps tweets?
1 comments

Maybe I don't understand something about these models. If the model was trained to mimic Trump tweets, it means that someone spent days of GPU time to find the weights of the model. Now if we want it to mimic HN comments, we'd need to spend the same amount of GPU time to find different weights. This is what I meant by "from scratch".
> ... if we want it to mimic HN comments, we'd need to spend the same amount of GPU time ...

These models are often much more general than you seem to be thinking. There's a base model which is incredibly computationally expensive to create from scratch. It is trained on a very large, very general set of data. Then there are specialized versions which are much cheaper to create - you start from the base model that you already have, and you train (much more briefly) on a specific set of data in order to tailor the output.

https://www.tensorflow.org/hub/tutorials/image_retraining

> Modern image recognition models have millions of parameters. Training them from scratch requires a lot of labeled training data and a lot of computing power (hundreds of GPU-hours or more). Transfer learning is a technique that shortcuts much of this by taking a piece of a model that has already been trained on a related task and reusing it in a new model.