Hacker News new | ask | show | jobs
by sa-code 170 days ago
Should the title be corrected to source-available?
1 comments

"weights-available" is probably the correct term, since it doesn't look like the training data is available.
Training data is not source code so that's irrelevant
It kind of is, though. You use some input material to produce the weights via some process, even if the weights might not become exactly the same every time you reproduce the process; the production of the weights isn't done by working with the weights, but with the training material and the process to convert them into weights. The analogy to source code and the resulting binaries is there.
Training data and the weights produced are not source code, just as access to the resulting binaries are not a requirement for open source.

Open source does not require full working implementations. There's no requirement that a code snippet that I release be fully working and identical to a complete solution.

So we are on agreement that "weights" are not source code. Training data might not also be actual "code", but it is source. After all, the model trained using that data tries to estimate its training data. It is the ground truth for the model.

About the access of binaries or providing working implementations, where did those come from? I don't think this thread was discussing those at all.

Indeed I would be willing to call something an "open source model" if it came without weights, but did come with the training data and with a documented process (preferably executable); and a release with just the training data could be called "open dataset" while the software to run the training would be just plain old open source software.

And, of course, a model with only the model data distributed with an open license is relatively commonly called "open weights", this being pretty self-explanatory term.

It is absurd to think that releasing open source code also requires releasing thousands of terabytes of Twitter and Reddit posts.

You already have access to all the training data everyone else is using.... You can download an offline version of Wikipedia. Here's every Reddit comment for a decade: https://academictorrents.com/details/ba051999301b109eab37d16...