Hacker News new | ask | show | jobs
by zargon 1064 days ago
It's not open source, it's freeware or something like that. Weights aren't the source code of LLMs, they're the binaries.
5 comments

Maybe this is just semantics, but I don't know if the OSS-vs-freemium distinction matters all that much (I'd have to think about the potential downsides a bit more tbh).

Virtually every discussion in the LLM space right now is almost immediately bifurcated by the "can I use this commercially?" question which has a somewhat chilling effect on innovation. The best performing open source LLMs we have today are llama-based, particularly the WizardLM variants, so giving them more actual industry exposure will hopefully be a force multiplier.

Llama isn't open source either. But if I understand your point correctly, you're saying that the commercial use axis is what is important to people, and it's orthogonal to freeware vs open source. In the present environment, I agree. But I don't think we should let companies get away with poisoning the term open source for things which are not. I also believe that actual open source models have the near-term opportunity to make an impact and shape the future landscape, with red pajamas and others in the works. The distinction could be important in the near term, at the rate this field is developing at.
Neural network weights are better viewed as source code because they specify what function the network computes. As we're operating purely on feed-forward networks, there are no loops. Therefore, weights fully describe everything relevant for executing their represented function on inputs. Weights can be seen as a sort of intermediate language (with lots of stored data and partially computed states) interpretable by some deep learning library.

The network architecture itself is not source code, but a rough specification constraining the optimizer, which searches for possible program descriptions that within the specified constraints, minimize some loss function with respect to the data.

Neither data nor network architecture are the actual source, they are better seen as recipes which if followed (will at great expense), allow finding behaviorally similar programs. As you can see, the standard ideas of open source don’t quite carry over because the actual "source-code" is not human interpretable.

> Weights can be seen as a sort of intermediate language (with lots of stored data and partially computed states) interpretable by some deep learning library.

I've often talked about weights being the equivalent to assembly, your note seems to map to a similar intuition. And in that sense provided we ever solve the interpretability problem, we could in theory disassemble the weights to achieve similar outcomes as we do in asm-to-C. Interesting thought experiment insofar as, if the weights ought not be classified as open source (notwithstanding your first point which I agree with), can the disassembled output be classified as open source?

> But I don't think we should let companies get away with poisoning the term open source for things which are not.

Thats totally fair. And you're correct in that I was making an argument for positive outcomes being orthogonal to the semantics distinction.

> I also believe that actual open source models have the near-term opportunity to make an impact and shape the future landscape, with red pajamas and others in the works. The distinction could be very important in the near term, at the rate this field is developing at.

I think Falcon and MPT support your point as well, but those are still models that were trained on very small budgets relative to llama or gpt-3/4. There's a clear quality delta, albeit that gap is closing. Through that lens, I think having a large, well-funded org doing the pre-training work for the OSS community and releasing the weights permissively is a net positive.

Sen. Marsha Blackburn said “fair use” protections have become a “fairly useful way to steal” intellectual property. Some people would like to use this situation to get rid of "fair use".
Forgive my ignorance, but might it matter if a country was hoping to limit another countries advancement into weaponising AI?
Strong disagree - I think OSS is fine framing of this. Weights are a third category, you can 'fork' them in an a way that you can't with standard binaries.
You can add hooks to functions and “fork” binaries, which is a pretty similar effort to adding training data to given model weights.
Nobody does that because if you only have binaries you probably don't have permission to do that. Plus it's impractical to make any significant changes that way.
If you have binaries you almost always have “permission” to do that — you can do whatever you’d like with files on your own system.
Maybe there is no source code? I imagine an LLM is like output of the following process. There's a huge room full of programmers that can directly edit machine code. You give them a random binary, which they then hack on for a while and publish the result. You then inspect it and tell them it isn't quite optimal in some way and ask them for a new version. Iterate on this process a bazillion times. At the end you get a binary that you're reasonably happy with. Nobody ever has the source code.
Source code is the preferred form for development.

In your scenario, despite the unrealistic coding process, the machine code is the source code, because that's what everyone is working on.

In the development of LLM, the weights is in no way the preferred form of development. Programmers don't work on weights. They work on data, infrastructure, the model, the code for training, etc. The point of machine learning is not to work on weights.

Unless you anthropomorphize optimizers, in which case the weights are indeed the preferred form of editing, but I had never seen anyone---even the most forward AGI supportors---argue that optimziers are intelligent agents.

What? You work on the weights - you just do it using tools like the optimizers, etc.

You release your weights, others can build on top of that, fine tune it in different ways, produce new weights they can share with others. Seems very OSS-y.

I feel like there is some semantic nitpicky point being made here that is completely going over my head.

By "work on", I mean "making direct edits". If we take broad definition of "work on", we lose all the distinction between source code and output. Any binary code is source code in any project, because the programmers simply is using tools to work on them, like the compiler.

For all practical purposes, if you are part of the team who released the LLMs, you would be writing and modifying the code of data processing, of the model, and of the training process. Those should be considered source code.

And we do have the model, which is pretty Oss-y, and which is why we can fine-tune the weights. But from a broader perspective, it's not fully Oss-y, because we don't have the code for anything else. There's no way to change, for example, how the training is done in the first place.

Agreed. Unfortunately it's those semantics that keep from losing lawsuits.
I read it in all such discussions. What does it mean? I just have a very high level understanding of AI models. No idea how things work under the hood or what knobs can be tweaked.
The source code is all the supporting code needed to run inference on the weights. This is usually python and in the case of llama it's already open source. Usually the source code is referred to as the "model". You can kind of think of the weights as a settings file in a normal desktop application. The desktop app has its own source code and loads in the settings file at runtime. It can load different settings files for different behaviors.
This is almost completely wrong. When peope who work in AI refer to the "model", they are generally referring to the weights. It is the weights which are the most important determinant of how the model performs, and it is the weights that require the most resources to develop. Associated code and other assets are also important, but they not the core asset. The intuitive sense of open sourcing a model therefore typically means releasing the weights under an open licence (ideally along with the training and inference code, data, training info, etc).
I am not making a value judgement on what's the "most important" aspect when comparing the code vs the weights. I am just explaining the terminology as I understand it. Your intuitive sense of open sourcing certainly makes sense to me. I think a lay person would expect to be able to generate content with an "open source ai model" and that wouldn't be possible if only the code was open sourced and not the weights.

If you can show me people who work in AI calling just the weights a "model" then I would happily update my internal definition of the word. I am certainly not an expert in the subject, I am just going off what I've read from the community over the past few years.

Open source is about freedom to modify the product. So in the context of an LLM, the source code is the data and the code that processes the data during *training* (not only inference), as that is what generates the weights.
I thought model is the output of training. It's a binary file black box. That's what I had read somewhere.
I think it's a little context dependent, and the definition seems to be fluid right now. I've seen "model" be used to refer to just the code, or to refer to the combination of the code and weights. I don't think I've seen it used to refer to just the weights, but I wouldn't be surprised if its used that way in some contexts.
Thank you for succinctly explaining the difference, I learned something today
Compiling source code doesn't cost million of dollars though
That doesn't change the meaning of Open Source. These are "free as in beer", not "free as in [modify the sources and rebuild it]". There are LLMs for which that is true, which include a specific list of training data. If you wanted to "uncensor" one of those, you could curate the source data and rebuild it, instead of trying to get it to unlearn what it was taught.
If you had petabytes of highly interconnected source code, it could.

In a rough way, a NN is just a compiler designed to translate a boatload of simple data into a useful program that operates on similar data.

Yea the weights are the secret sauce that OpenAI and competitors generally protect.