| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MrLeap 413 days ago

> that's a skill issue and not a fundamental property

This made me laugh.

You seem like you may know something I've been curious about.

I'm a shader author these days, haven't been a data scientist for a while, so it's going to distort my vocab.

Say you've got a trained neural network living in a 512x512 structured buffer. It's doing great, but you get a new video card with more memory so you can afford to migrate it to a 1024x1024. Is the state of the art way to retrain with the same data but bigger initial parameters, or are there other methods that smear the old weights over a larger space to get a leg up? Anything like this accelerate training time?

... can you up sample a language model like you can lowres anime profile pictures? I wonder what the made up words would be like.

1 comments

kouteiheika 413 days ago

In general this is of course an active area of research, but yes, you can do something that and people have done it successfully[1] by adding extra layers to an existing model and then continuing to train it.

You have to be careful about the "same data" part though; ideally you want to train once on unique data[2] as excessive duplication can harm the performance of the model[3], although if you have limited data a couple of training epochs might be safe and actually improve the performance of the model[4].

[1] -- https://arxiv.org/abs/2312.15166

[2] -- https://arxiv.org/abs/1906.06669

[3] -- https://arxiv.org/abs/2205.10487

[4] -- https://galactica.org/static/paper.pdf

link

yorwba 413 days ago

In addition to increasing the number of layers, you can also grow the weight matrices and initialize by tiling them with the smaller model's weights https://neurips.cc/media/neurips-2023/Slides/83968_5GxuY2z.p...

link

MrLeap 413 days ago

Thank you for taking the time to provide me all this reading.

link

ijk 413 days ago

This might be obvious, but just to state it explicitly for everyone: you can freeze the weights of the existing layers if you want to train the new layers but want to leave the existing layers untouched.

link