Hacker News new | ask | show | jobs
by kouteiheika 368 days ago
In general this is of course an active area of research, but yes, you can do something that and people have done it successfully[1] by adding extra layers to an existing model and then continuing to train it.

You have to be careful about the "same data" part though; ideally you want to train once on unique data[2] as excessive duplication can harm the performance of the model[3], although if you have limited data a couple of training epochs might be safe and actually improve the performance of the model[4].

[1] -- https://arxiv.org/abs/2312.15166

[2] -- https://arxiv.org/abs/1906.06669

[3] -- https://arxiv.org/abs/2205.10487

[4] -- https://galactica.org/static/paper.pdf

3 comments

In addition to increasing the number of layers, you can also grow the weight matrices and initialize by tiling them with the smaller model's weights https://neurips.cc/media/neurips-2023/Slides/83968_5GxuY2z.p...
Thank you for taking the time to provide me all this reading.
This might be obvious, but just to state it explicitly for everyone: you can freeze the weights of the existing layers if you want to train the new layers but want to leave the existing layers untouched.