| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Micoloth 696 days ago

I think what you are referring to is the concept of “finetuning”. You use a pretrained network and add a (relatively) small set of new input-output pairs to steer it in a new direction.

It's widely used, you can look it up.

A more challenging idea is whether it is possible to reuse the pretrained weights when training a network with a different architecture (maybe a bigger transformer with more heads, or something).

AFAIK this is not common practice, if you change the architecture you have to retrain from scratch. But given the cost of these trainings, I wouldn't be surprised if OpenAI&co had developed some technique to do this, eg across GPT versions..

1 comments

cuteboy19 696 days ago

Full arch changes are rare. Mostly you would just attach stuff on top or at the sides

link