Hacker News new | ask | show | jobs
by williamstein 1064 days ago
Maybe there is no source code? I imagine an LLM is like output of the following process. There's a huge room full of programmers that can directly edit machine code. You give them a random binary, which they then hack on for a while and publish the result. You then inspect it and tell them it isn't quite optimal in some way and ask them for a new version. Iterate on this process a bazillion times. At the end you get a binary that you're reasonably happy with. Nobody ever has the source code.
1 comments

Source code is the preferred form for development.

In your scenario, despite the unrealistic coding process, the machine code is the source code, because that's what everyone is working on.

In the development of LLM, the weights is in no way the preferred form of development. Programmers don't work on weights. They work on data, infrastructure, the model, the code for training, etc. The point of machine learning is not to work on weights.

Unless you anthropomorphize optimizers, in which case the weights are indeed the preferred form of editing, but I had never seen anyone---even the most forward AGI supportors---argue that optimziers are intelligent agents.

What? You work on the weights - you just do it using tools like the optimizers, etc.

You release your weights, others can build on top of that, fine tune it in different ways, produce new weights they can share with others. Seems very OSS-y.

I feel like there is some semantic nitpicky point being made here that is completely going over my head.

By "work on", I mean "making direct edits". If we take broad definition of "work on", we lose all the distinction between source code and output. Any binary code is source code in any project, because the programmers simply is using tools to work on them, like the compiler.

For all practical purposes, if you are part of the team who released the LLMs, you would be writing and modifying the code of data processing, of the model, and of the training process. Those should be considered source code.

And we do have the model, which is pretty Oss-y, and which is why we can fine-tune the weights. But from a broader perspective, it's not fully Oss-y, because we don't have the code for anything else. There's no way to change, for example, how the training is done in the first place.

Agreed. Unfortunately it's those semantics that keep from losing lawsuits.