| > that the AI field has somehow normalised the goalpost moving from capabilities all the way to definitions about open source The problem is that Facebook and others are trying to move the goalpost, while others like me would like the goalpost to remain where it is, namely we call projects "Open source" when the required parts to build it on our own machines, is sufficiently accessible. As I probably wouldn't be a developer in the first place if it wasn't for FOSS, and I spend literally all day long contributing to others FOSS projects and working on my own, it's kind of scary seeing these large companies trying to change what FOSS means. I think you're forgetting about the intent and purpose of open source. The goal is that people can run software for whatever purpose they want, and they can modify it for whatever purpose. This is the intent behind the licenses we use when we "create FOSS". This means, in practice, that the source code has to be accessible somehow, so the compiler I have on my computer, can build a similar binary to the one the project itself offers (if it does). The source code has to be accessible so I can build the project, but also modify it for myself. Taking this idea that mostly only applied to software before (FOSS) but applying it to ML instead, it's clear to see what we need in order to 1) be able to use it as we want and 2) be able to modify it as we want. > You can see the weights. You can change the weights. You can re-distribute the weights. It's open source. Right. If I upload a binary to some website, you can see the binary, you can change the binary and you can re-distribute it. Would you say the binary is open source? The weights are the binary in ML contexts. It's OK for projects to publish those weights, but it's not OK to suddenly change the definition and meaning of open source because companies want to look like they're doing FOSS, when in reality they're publishing binaries without any ways of building those binaries with your own changes. Imagine if the Linux kernel was just a big binary blob. Yes, you can change it, re-distribute and what not, but only in a binary-blob shape. You'd be kind of out there if you insist on calling this binary-blob kernel FOSS. I'm sure you'd be able to convince some Facebook engineers about it, seems they're rolling with that idea already, but the rest of us who exist in the FOSS ecosystem? We'd still have the same goalpost in the exact same spot it's been for at least two decades I've been involved. |
Great question. Is the assembly code in a git, with an open source license? Then yes! It's open source!
Think about it this way: just because someone wrote hello world in c and then a compiler translated that into assembly, doesn't invalidate the quality of that assembly code being open source! That's the point. Something is open source or not if the resulting stuff is published under an open source license. Can you see the assembly code? Can you change it? Can you re-publish it? If all of these are yes, then it's open source!
> Imagine if the Linux kernel ...
That is semantics. The linux kernel is published in c because it's easier for people to reason in that abstracted language, but it would not suddenly become "closed source" if it were written in asm, assuming it would still be published under an open source license.
In other words, you having access to the "dataset" would not make the weights any easier to work with. They would still be in a "blob" as you call it.