Hacker News new | ask | show | jobs
by currymj 2363 days ago
I would argue most machine learning papers that use public datasets have code available and are often also reproduced independently (sometimes just because of somebody's need to port between PyTorch/TensorFlow). Reproducibility is still a big problem in reinforcement learning, however.

People are definitely thinking carefully about issues of noise and quantization error. Low-precision or quantized neural networks are increasingly popular at both train and test time. And people deliberately introduce noise into neural networks for various reasons (dropout, robustness certificates) and then have to think about the effect on performance. Typically things are quite reproducible in these situations btw, for a given noise distribution.

Re: canonicalization, the theoretical work on "neural tangent kernels" might be relevant.

1 comments

>I would argue most machine learning papers that use public datasets have code available and are often also reproduced independently

lol have you ever tried? i have several github issues on published models because i couldn't recreate that have responses like "i don't remember the parameters i used and we've moved on".

i’ve only run into stuff like that trying to get RL agents to train. Maybe GANs and other “hard to train” models are bad too idk. But generally things do actually seem to be reproducible.
>Maybe GANs and other “hard to train” models

yes GANs are definitely one of the places i've run into this but also with almost bog-standard resnets i've had issues.