Hacker News new | ask | show | jobs
by gigantum 2603 days ago
Like some of the other ML/AI posts that made it to the top page today, this research too does not give any clear way to reproduce the results. I looked through the pre-print page as well as the full manuscript itself.

Without reproducibility and transparency in the code and data, the impact of this research is ultimately limited. No one else can recreate, iterate, and refine the results, nor can anyone rigorously evaluate the methodology used (besides giving a guess after reading a manuscript).

The year is 2019, many are finally realizing it's time to back up your results with code, data, and some kind of specification of the computing environment you're using. Science is about sharing your work for others in the research community to build upon. Leave the manuscript for the pretty formality.

1 comments

>any clear way to reproduce the results.

Given that it's evolved I'd imagine this is a given? Or more accurately you could probably duplicate some kind of emergent behaviour but it would be different given different randomized parameters

More of what the point is I think is that they don't go into any meta-analysis of big changes that were seen in many of the trials. They don't try to isolate specific mechanisms that formed in a majority of trials that almost made it to this stage for example. They just don't really go into any analysis of the failure trees in trial dataset at all.

IMHO this is probably just a case of them trying to stretch this out across a bunch of different papers, and this is just the announce paper. Which is a shitty practice, but the current academic environment encourages taking good findings and puffing them up into multiple incomplete papers rather than one well-done paper.

Usually you use an RNG for which you can publish the seed. So, although it’s random, you can reproduce the results.
Glancing through the paper it seems like they use the recent Transformer model. Does whatever underlying stack they use expose something to share RNG seeds and the exact hardware optimizations your environment applies during training? Otherwise "publishing the seed" sounds nice but might not be as trivial as the phrase suggests.
reproducibility should be something that's baked into an experiment's design.

so, if their experiment was designed such that reproduction is inherently difficult, they should have designed it in a better way, and they should've used a toolset that wouldn't run into that problem.

a non-reproducible experiment isn't necessarily completely without value, but it's a thing that everyone should look askance at till it proves its worth.

(apologies if my comments don't apply to this experiment and if it is reproducible -- i didn't have time to read through the OP, but i thought this reply was still a worthwhile response to its specific parent comment)

No that's absolutely a fair and true point, my comment was more pointed at the RNG aspect. I have not looked into this specific one either but normally people would hopefully not publish their best randomly achieved run if the system cannot reproduce it or similar results.

That being said the paper in question doesn't seem to reference open source code anyway so I guess my point was kind of moot, apologies.

For the most part, yes.

There are specific CUDA operations which are not guaranteed to be reproducible though, as well as some CuDNN operations which are non-determanistic without performance sacrifice, and this does cause real problems.

See https://pytorch.org/docs/stable/notes/randomness.html for some reasonable docs on this.

There are many CS conferences where you can/should submit a VM image to reproduce the results. See, e.g.: http://cavconference.org/2018/artifact-submission-and-evalua...
You want to be able to set the seed if only you want to be able to debug your program. Pseudo random is sufficient for these models and is independent of any hardware settings. You should not share your random source between concurrent threads, though, but that’s good practice anyway.
Most machine learning accelerators have a few non-deterministic operations. The chances that you could run trillions of floating point operations through a GPU and get a bit-for-bit identical result is low.
Really? I'm not an ML guy so in simple terms, what are these non-deterministic ops? Or are you saying GPUs can be expected to be, basically, faulty?
Both.

Some operations split and join data in non-deterministic ways (especially the order of operations, leading to different floating point rounding). If you shard across multiple machines, weight accumulation order will depend on network latency for example.

Also, GPU's aren't anywhere near as reliable as CPU's when it comes to being able to run for hours without any random bit flips/errors.

Is it not possible to use the same seed and random number generator to reproduce the results accurately?
You got to be careful. RNG is being used to initialise the layers but also for mini-batch selection. They are usually different RNG's.