| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by NitpickLawyer 632 days ago

Haha, having lengthy discussions, especially when we disagree, is healthy IMO. That's how we get to experience other viewpoints, and hopefully become better for the effort.

> Can I build this thing from scratch myself

You absolutely can. Everything you need is in the model config (layers, stuff) and there are training scripts all over the net. Now, granted, you will not necessarily get the same results, but then again neither is Mistral or Meta.

> but the authors tries to argue that the compiled asm output is "open source". It's output, not source, so you cannot license the output as "open source" as it's missing that last part, the "source".

Replying here because I can't in the other subthread. I think you are using a misconception on what is source code, and what is a weight. In the LLM world, you already have the source code for inferencing. This would be either pytorch or c code or whatever. You also have the architecture code. You can see what the model looks like, what layers it has, what ops it does to reach a result. That is also open! So you get the source to run inference. You get the source to "load" the model (i.e. the architecture, layers, etc). And you get a bunch of hardcoded values.

What you don't get is the why behind the question "why is this value x and not y". And for the most part, no one knows.

> If they're not available, or available but under restrictions (usage or otherwise), then it's not open source.

Let's take another (famous) example. Quake is famous for having a hardcoded value somewhere in the source code, that speeds up some geometry computations. Now, you can change that value, but things will be messed up in the engine. Collisions will happen weirdly, things will look bad. Now, is quake any less of "open source" if you or I don't understand why the original coder chose that value? Of course not! Well, now just multiply that with 1B hardcoded values. It's the exact same thing. You could change any of the values, but the game would look wonky as you do so. But, at the end of the day, it would not be any less open source.

I guess what I'm trying to say is that weights are not binary blobs. Weights are not an obfuscation attempt. Weights are distributed exactly how they are intended to be used, and how they are being used by the creators as well. You can change the architecture of a model (see above for details). You can add layers, you can remove layers. You can perform "abliterations", or you can do fine-tuning. Everything is exactly done as the "creators" intended. The only thing you don't have is "how they got those exact same numbers". But you don't need that. And it's funny that somehow for LLMs that's a bridge too far. It never used to be for any other project.