| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mgreg 893 days ago
	I very much appreciate that the authors not only published their code (https://github.com/llm-random/llm-random) but included the dataset they used (available on Huggingface - https://huggingface.co/datasets/c4) as well as the training process and hyperparameters they used so others can replicate and build on their work. The only thing really missing is the weights which would be nice to have on huggingface as well.

1 comments

swells34 893 days ago

It's very confusing to me that you are praising the authors of a published scientific paper for almost making their work reproduceable.

link

chaxor 893 days ago

If we had a proper data version control, wherein the git commit hash was tied directly to the output data hash and hosted on IPFS (and the make system checked ipfs like it does local files for the cache) then it would be absolutely reproducible.

And the wonderful thing is, every person that used git clone on this repo and ran it would be serving the NN weights.

But alas, this unfortunately hasn't been done yet.

link

astrange 893 days ago

That's not what confusing means.

Feigned confusion

The weights aren't needed to make it reproducable. The code and training data are needed. Hopefully if you used those, you'd ultimately reach the same result.

link

tbalsam 893 days ago

Even in the days where this was standard, that is not the case entirely.

There is a whole other world between "released code" and "getting the results as seen in the paper".

Unfortunately. The reproducibility crisis is very much well and alive! :'( Much more to go into but it is a deep rabbit hole, indeedy. :'((((

link

_ea1k 892 days ago

I guess I'm saying that if there are reproducibility problems without the weights, then there's still a reproducibility problem with them. A paper with weights that magically work, when training on the same data and algorithm doesn't work is a paper that isn't reproducible.

IMO, having the weights available sometimes just papers over a deeper issue.

link

abdullin 892 days ago

Training, especially on large GPU clusters, is inherently non-deterministic. Even, if all seeds are fixed.

This boils down to framework implementations, timing issues and extra cost of trying to ensure determinism (without guarantees).

link

zcw100 892 days ago

Random initialization would keep you from producing the exact same results.

link

_ea1k 891 days ago

Yes, but there's a difference between exact results and reproducible results. I should get similar performance, otherwise there is an issue.

link

jakderrida 893 days ago

It's a sad world where our standards are that low. But they are that low for good reasons.

link

theLiminator 893 days ago

If anything CS papers are far more reproducible than most papers. Maybe that is sad, but I think most scientists and researchers are trying their best.

link

mgreg 893 days ago

I understand where you're coming from but what they provided DOES make their work reproducible. You can use the data, source code, and recipe to train the model and get the weights.

It would be nice if they provided the weights so it could be USABLE without the effort or knowledge required.

We (I think) would all like to see more _truly_ open models (not just the source code) that enable collaboration in the community.

link

kevindamm 893 days ago

Only if they also include the random seed they used for the initial weights, otherwise you may be able to reproduce similar performance but will not likely obtain their same weights.

link

CGamesPlay 892 days ago

But that's a lot like saying that my recipe for muffins isn't reproducible because it doesn't say exactly which batch of which field my flour comes from. I mean, of course you won't get the same muffins, but if your muffins taste just as good it's still a win.

link

blovescoffee 892 days ago

If this work is valuable, the random seed shouldn't affect the outcome thaaat much.

link