Hacker News new | ask | show | jobs
by jncfhnb 888 days ago
The model weights ARE the preferred form for modification
3 comments

As long-time a 'practitioner' of machine learning models I strongly disagree, the preferred form for model modification is by retraining the model with a tweak to the parameters or the training algorithm or the model structure or data selection or length of training.

You can get some effects by fine tuning, and in that case it may be preferable as it's cheaper, but in general if I want to have a different or better model, that involves retraining.

I don’t really believe your long time practitioning is aligned to the kind of models being discussed
Yeah, that's why data scientists are out there editing the weights rather than cleaning up datasets and rerunning training with different settings.
If that was supposed to be clever it just sounds naive. There’s a ton of work going on fine tuning open source models
> There’s a ton of work going on fine tuning

... models provided in weights only form. (mostly!)

I believe the preferred form would be the whole kit and caboodle: the collection and filtering scripts, the data to the extent that it's non-public, the training routine, and the model weights... because sometimes you'll perform changes at any of those stages.

Do you actually do this for a living? Do you have experience doing this and have credibility talking about what’s preferred? I do.
OK. Where is your reproduction of Pythia trained from scratch? Or MPT? Or Amber? Shall we play a game where you give paper regarding pretraining (and we are not taling about puny models based on wikitext2) I give you a paper based around finetuning and we'll see who run out of papers first?
Reproduction is not the goal! Making papers is not the goal! Making useful models is the goal. And having open source models is by an enormous degree more useful thing.

I see you’re someone else, so I’ll ask you too. Do you actually have any experience doing this? Have you ever fine tuned models or tried to change architecture or put a piece of one model into another?

Unless you want to try modifying the model structure, in which case the weights aren’t necessarily valid anymore and will need to be retrained.