Hacker News new | ask | show | jobs
by MrEldritch 2884 days ago
I know this is a big issue in AI/ML right now. Deepmind's papers are notoriously hard to reproduce, because they will lay out the general terms of the architecture but not specific implementation details - things like filter length, stride, number of layers, number of hidden units, feature selection, and all the little tricks of initialization or normalization or a zillion other subtleties.

The trouble being that those "specific implementation details" are typically non-obvious and absolutely crucial to getting the system described to work at all. For instance, as far as I know, nobody's managed to implement a WaveNet that sounds anything like as good as Google's samples. Neural Turing Machines - published three years ago - were so finicky that someone actually figuring out how to implement the damn thing and have it actually work as described was enough to warrant a paper of its own (Implementing Neural Turing Machines, https://arxiv.org/pdf/1807.08518.pdf). Not to mention how hard it is to iterate on failed replications when you aren't blessed with ten thousand Nvidia Teslas and custom tensor ASICs and have to wait eternities for models to train. At this point, I think most of the community just kind of looks at their papers, sighs in jealousy, and moves on.