|
|
|
|
|
by _flux
170 days ago
|
|
So we are on agreement that "weights" are not source code. Training data might not also be actual "code", but it is source. After all, the model trained using that data tries to estimate its training data. It is the ground truth for the model. About the access of binaries or providing working implementations, where did those come from? I don't think this thread was discussing those at all. Indeed I would be willing to call something an "open source model" if it came without weights, but did come with the training data and with a documented process (preferably executable); and a release with just the training data could be called "open dataset" while the software to run the training would be just plain old open source software. And, of course, a model with only the model data distributed with an open license is relatively commonly called "open weights", this being pretty self-explanatory term. |
|
You already have access to all the training data everyone else is using.... You can download an offline version of Wikipedia. Here's every Reddit comment for a decade: https://academictorrents.com/details/ba051999301b109eab37d16...