Hacker News new | ask | show | jobs
by blackeyeblitzar 624 days ago
A reinforcement of definitions is needed. Open weights is NOT open source. But there are people like Meta that are rampantly open washing their work. The point of open source is that you can recreate the product yourself, for example by compiling the source code. Clearly the equivalent for an LLM is being able to retrain the model to produce the weights. Yes I realize this is impractical without access to the hardware, but the transparency is still important, so we know how these models are designed, and how they may be influencing us through biases/censorship.

The only actually open source model I am aware of is AI2’s OLMo (https://blog.allenai.org/olmo-open-language-model-87ccfc95f5...), which includes training data, training code, evaluation code, fine tuning code, etc.

The license also matters. A burdened license that restricts what you can do with the software is not really open source.

I do have concerns about where OSI is going with all this. For example, why are they now saying that reproducibility is not a part of the definition? These two paragraphs below contradict each other - what does it mean to be able to “meaningfully fork” something and be able to make it more useful if you don’t have the ingredients to reproduce it in the first place?

> The aim of Open Source is not and has never been to enable reproducible software. The same is true for Open Source AI: reproducibility of AI science is not the objective. Open Source’s role is merely not to be an impediment to reproducibility. In other words, one can always add more requirements on top of Open Source, just like the Reproducible Builds effort does.

> Open Source means giving anyone the ability to meaningfully “fork” (study and modify) a system, without requiring additional permissions, to make it more useful for themselves and also for everyone.

2 comments

> what does it mean to be able to “meaningfully fork” something and be able to make it more useful if you don’t have the ingredients to reproduce it in the first place?

I could be misunderstanding them, but my takeaway is that exact bit for bit reproducibility is not required. Most software, including open source, is not bit for bit reproducible. Exact reproducibility is a fairly new concept. Even with all the training data, and all the code, you are unlikely to get the exact same model as before.

Though if that is what they mean, then they should be more explicit about it.

I agree, Open Weights are Open "Binary", not Open Source.

It's like taking an executable (.so module, firmware blob) and releasing it under permissive license, so anyone could disassemble, modify and hack it. And then disclosing what programming languages were used and pointing at a few libraries. And then saying that no, actual source code is not going to be released.