| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by piperswe 390 days ago
	But where's the source? I just see a binary blob, what makes it open source?

4 comments

jacob019 390 days ago

The weights are the source. It isn't as though something was compiled into weights. They're trained directly. But I know what you mean, it would be more open to have the training pipeline and souce dataset available.

link

timschmidt 390 days ago

The weights seem much more like a binary to me, the training pipeline the compiler, and the training dataset the source.

link

jumski 390 days ago

Come here to write this - perfect analogy!

link

reedciccio 390 days ago

It's very imperfect analogy though these things can't be rebuilt "from scratch" like a program, the training process doesn't seem to be replicable anyway. Nonetheless, full data disclosure is necessary, according to the result of the years-long consultation led by the Open Source Initiative https://opensource.org/ai

link

timschmidt 390 days ago

> the training process doesn't seem to be replicable anyway

The training process is fully deterministic. It's just an algorithm. Feed the same data in and you'll get the same weights out.

If you're speaking about the computational cost, it used to be that way for compilers too. Give it 20 years and you'll be able to train one of today's models on your phone.

link

kouteiheika 389 days ago

> The training process is fully deterministic. It's just an algorithm. Feed the same data in and you'll get the same weights out.

No it is not. The training process is non-deterministic, and given exactly the same data, the same code and the same seeds you'll get different weights. Even the simplest operations like matrix multiplication will give you slightly different results depending on the hardware you're using (e.g. you'll get different results on CPU, on GPU from vendor #1 and on GPU from vendor #2, and probably on different GPUs from the same vendor, and on different CUDA versions, etc.), but also depending on the dimensions of the matrices you'll get different results (e.g. if you fuse the QKV weights from modern transformers into a single matrix and do a single multiplication instead of multiplying each separately you'll get different results), and some algorithms (e.g. backwards pass of Flash Attention) are explicitly non-deterministic to be faster.

link

jacob019 389 days ago

A lot of quibbling here, wasn't sure where to reply. If you've built any models in PyTorch, then you know. Conceptually it is deterministic, a model trained using deterministic implementations of low level algorithms will produce deterministic results. And when you are optimizing the pipeline, it is common to do just that:

    torch.manual_seed(0)
    random.seed(0)
    np.random.seed(0)
    torch.use_deterministic_algorithms(True)

But in practice that is too slow, we use nondeterministic implementations that run fast and loose with memory management and don't necessarily care about the order in which parallel operations return.

link

willmarch 390 days ago

I’m pretty sure the initial weights are randomized meaning no two models will train in the same way twice. The order in which you feed in training data to the model would also add an element of randomness. Model training is closer to growing a plant than running a compiler.

link

desdenova 390 days ago

What makes models non-deterministic isn't the training algorithm, but the initial weights being random.

Training is reproducible only if, besides the pipeline and data, you also start from the same random weights.

link

reedciccio 390 days ago

Can you point at the research that says that the training process of a LLM at least the size of OLMo or Pythia is deterministic?

link

otabdeveloper4 390 days ago

You can fine-tune their weights and release your own take.

E.g. see all the specialized third-party models out there based on Qwen.

"Open-source" is the wrong word here, what they mean is "you can modify and redistribute these weights".

link

yetihehe 390 days ago

You can also reverse engineer and modify closed source programs (see mods for games). Weights are like compiled version of source data.

link

otabdeveloper4 389 days ago

Finetuning isn't reverse engineering. Finetuning is a standard supported workflow for these models.

Also, the "redistribute" part is key here.

link

yetihehe 389 days ago

> Finetuning isn't reverse engineering

Fully agree, it isn't. Reverse engineering isn't necessary for modifying compiled program behaviour, so comparing it to finetuning is not applicable. Finetuning applied to program domain would be more like adding plugins or patching in some compiled routines. Reverse-engineering applied to models would be like extracting source documents from weights.

> Finetuning is a standard supported workflow for these models.

Yes, so is adding mods for some games, just put your files in a designated folder and game automatically picks it up and does required modifications.

> Also, the "redistribute" part is key here.

It is not. Redistributability and being open source is orthogonal. You can have a source for a program and not be able to redistribute source or program, or you can redistribute a compiled program, but not have it's source (freeware).

link

macrolime 390 days ago

Not legally. That's the difference.

link

timschmidt 390 days ago

Sure you can. It's often legally protected activity. You're just limited to distributing your modifications without the original work.

link

macrolime 389 days ago

For some games maybe, but software often has a clause forbidding reverse engineering

link

timschmidt 389 days ago

ChatGPT says that such clauses are typically void in the EU, though they may apply in some cases in the US. Even in the US, the triennial DMCA rule-making has granted broader exemptions for good-faith security research every cycle since 2016.

https://chatgpt.com/share/6838c070-705c-8005-9a88-83c9a5550a...

link

microtonal 390 days ago

There is work to try to reproduce (the original) R1: https://huggingface.co/open-r1

link

1una 390 days ago

I won't call it "binary blob". Safetensors is just a simple format for storing tensors safely: https://huggingface.co/docs/safetensors/index

link