| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jncfhnb 888 days ago
	The model weights ARE the preferred form for modification

3 comments

PeterisP 888 days ago

As long-time a 'practitioner' of machine learning models I strongly disagree, the preferred form for model modification is by retraining the model with a tweak to the parameters or the training algorithm or the model structure or data selection or length of training.

You can get some effects by fine tuning, and in that case it may be preferable as it's cheaper, but in general if I want to have a different or better model, that involves retraining.

link

jncfhnb 888 days ago

I don’t really believe your long time practitioning is aligned to the kind of models being discussed

link

anothernewdude 888 days ago

Yeah, that's why data scientists are out there editing the weights rather than cleaning up datasets and rerunning training with different settings.

link

jncfhnb 888 days ago

If that was supposed to be clever it just sounds naive. There’s a ton of work going on fine tuning open source models

link

nullc 888 days ago

> There’s a ton of work going on fine tuning

... models provided in weights only form. (mostly!)

I believe the preferred form would be the whole kit and caboodle: the collection and filtering scripts, the data to the extent that it's non-public, the training routine, and the model weights... because sometimes you'll perform changes at any of those stages.

link

jncfhnb 887 days ago

Do you actually do this for a living? Do you have experience doing this and have credibility talking about what’s preferred? I do.

link

Trapais 887 days ago

OK. Where is your reproduction of Pythia trained from scratch? Or MPT? Or Amber? Shall we play a game where you give paper regarding pretraining (and we are not taling about puny models based on wikitext2) I give you a paper based around finetuning and we'll see who run out of papers first?

link

jncfhnb 887 days ago

Reproduction is not the goal! Making papers is not the goal! Making useful models is the goal. And having open source models is by an enormous degree more useful thing.

I see you’re someone else, so I’ll ask you too. Do you actually have any experience doing this? Have you ever fine tuned models or tried to change architecture or put a piece of one model into another?

link

tonyarkles 888 days ago

Unless you want to try modifying the model structure, in which case the weights aren’t necessarily valid anymore and will need to be retrained.

link