|
|
|
|
|
by MrLeap
366 days ago
|
|
> that's a skill issue and not a fundamental property This made me laugh. You seem like you may know something I've been curious about. I'm a shader author these days, haven't been a data scientist for a while, so it's going to distort my vocab. Say you've got a trained neural network living in a 512x512 structured buffer. It's doing great, but you get a new video card with more memory so you can afford to migrate it to a 1024x1024. Is the state of the art way to retrain with the same data but bigger initial parameters, or are there other methods that smear the old weights over a larger space to get a leg up? Anything like this accelerate training time? ... can you up sample a language model like you can lowres anime profile pictures? I wonder what the made up words would be like. |
|
You have to be careful about the "same data" part though; ideally you want to train once on unique data[2] as excessive duplication can harm the performance of the model[3], although if you have limited data a couple of training epochs might be safe and actually improve the performance of the model[4].
[1] -- https://arxiv.org/abs/2312.15166
[2] -- https://arxiv.org/abs/1906.06669
[3] -- https://arxiv.org/abs/2205.10487
[4] -- https://galactica.org/static/paper.pdf