Hacker News new | ask | show | jobs
by tytso 890 days ago
But one of the four freedoms is being able to modify/tweek things, including the model. If all you have is the model weights, then you can't easily tweak the model. The model weights is hardly the preferred form for making changes to update the model.

The equivalent would be someone which gives you only the binary to Libreoffice. That's perfectly fine for editing documents and spreadsheets, but suppose you want to fix a bug in Libreoffice? Just having the binary is going to make it quite difficult to fix things.

Simiarly, suppose you find that the model has a bias in terms of labeling African Americans as criminals; or women as lousy computer programmers. If all you have is the model weights of the trained model, how easily can you fix the model? And how does that compare with running emacs on the Libreoffice binary?

4 comments

If all you have are the model weights, you can very easily tweak the model. How else are all these "decensored" Llama2 showing up on Hugging Face? There's a lot of value in a trained LLM model itself and it's 100% a type of openness to release these trained models.

What you can't easily do is retrain from scratch using a heavily modified architecture or different training data preconditioning. So yes, it is valuable to have dataset access and compute to do this and this is the primary type of value for LLM providers. It would be great if this were more open — it would also be great if everybody had a million dollars.

I think it's pretty misguided to put down the first type of value and openness when honestly they're pretty independent, and the second type of value and openness is hard for anybody without millions of dollars to access.

Well, by that argument it's trivially easy to run emacs on a binary and change a pathname --- or wrap a program with another program to "fix a bug". Easy, no?

And yet, the people who insist on having source code so they can edit the program and recompile it have said that for programs, having just the binary isn't good enough.

>suppose you find that the model has a bias in terms of labeling African Americans as criminals; or women as lousy computer programmers. If all you have is the model weights of the trained model, how easily can you fix the model?

That's textbook fine-tuning and is basically trivial. Adding another layer and training that is many orders of magnitude more efficient than retraining the whole model and works ~exactly as well.

Models are data, not instructions. Analogies to software are actively harmful. We do not fix bugs in models any more than we fix bugs in a JPEG.

Instructions is exactly what weights are. We just have no idea what those instructions are.
You can fine tune a model, you ve got way more power to do so given the trained model than starting from scratch and the raw data.
Next step will be to ask for GPU time. Because even with data, model code and training framework you may have no resources to train. "The equivalent would be" someone gives you the code, but no access to mainframe which is required to compile. Which would make it not open source(?) There are other variations, like original compiler was lost, current compilers aren't backward compatible. Does that make old open source code closed now?

In other words there should be a reasonable line when model is called open source. In extreme view it's when the model, the training framework, and the data are available for free. This would mean open source model can be trained only on public domain data. Which makes class of open source models very, very limited.

More realistic is to make the code and the weights available. So that with some common knowledge new model can be trained, or old fine tuned, on available data. Important note: weights cannot be reproduced even if original training data is available. It will be always a new model with (slightly) different responses.

Down voted, hmm... I'll add bit more then. Sometimes it's even good that model cannot be easily reproduced. Original developers usually have some skills and responsibility. While 'hackers' don't. It's easy to introduce bias into the data , like removing selected criminal records, and then publish model with similar name. That would be confusing, some may mistake fake one for the real.

PS: If I ever make my models open I can't open the data anyway. License on images directly prohibits publishing them.