Hacker News new | ask | show | jobs
by umanwizard 2878 days ago
I'm not in academia so maybe I'm being flippant, but what is even the point of a paper whose results can't be reproduced? Are you really advancing knowledge in any meaningful sense if someone can't repeat what you did?
4 comments

Exactly. In one recent case, someone published a paper about an interesting graph centrality metric that I was interested in trying out. Unfortunately, their description in the paper was far too vague to be useful - "implemented as a simple extension of Brandes' algorithm". In attempting to reproduce it, that meant I needed to go read up on the algorithm they extended, and then try to figure out how they actually extended it. In the end, I couldn't actually reproduce the work, and never heard back about the code that the authors used in their published work. That severely degrades the utility of the paper. Yes - the paper does contain some knowledge that they shared with the world, but it was difficult to build upon and replicate since they failed to describe what amounts to the experimental apparatus and setup that was used to obtain the results they published. Unfortunately, this is relatively common in CS (at least, the corners of CS where I work).
I know this is a big issue in AI/ML right now. Deepmind's papers are notoriously hard to reproduce, because they will lay out the general terms of the architecture but not specific implementation details - things like filter length, stride, number of layers, number of hidden units, feature selection, and all the little tricks of initialization or normalization or a zillion other subtleties.

The trouble being that those "specific implementation details" are typically non-obvious and absolutely crucial to getting the system described to work at all. For instance, as far as I know, nobody's managed to implement a WaveNet that sounds anything like as good as Google's samples. Neural Turing Machines - published three years ago - were so finicky that someone actually figuring out how to implement the damn thing and have it actually work as described was enough to warrant a paper of its own (Implementing Neural Turing Machines, https://arxiv.org/pdf/1807.08518.pdf). Not to mention how hard it is to iterate on failed replications when you aren't blessed with ten thousand Nvidia Teslas and custom tensor ASICs and have to wait eternities for models to train. At this point, I think most of the community just kind of looks at their papers, sighs in jealousy, and moves on.

You must make possible for other researchers to reproduce your work, but that does not mean you have to give it to them for free.

I work in an university, but in close collaboration with industrial companies. We use models that we explain with detail in our papers, so that other researchers can write their own implementations and maybe confirm or disprove our results. But we do not make the code available.

This is what I see, that does not mean I think it is the best approach. I would really like to release all the code I write. It would make research advance faster, and I do not think that it would harm the company who pays for my work in any way.

But they pay, and they have strict policies. At least they allow me to share most of my code with other researches in a personal basis and write papers about it. I am quite sure that, if to publish papers I had to always share the code, they would just directly hire me or someone else to do it and there would be no papers at all.

I think a lot of this depend on the field and context. For example in physics there are a lot of commercially available simulation tools that help with analysis of an experiment. You of course describe the approach and setup of the simulation but you can not publish/reference the source code since you are just a licensee.
Psychology is one of the worst offenders. A lot of junk science in that field. Some psych papers read more like advocacy than actual science.

https://www.nature.com/news/over-half-of-psychology-studies-...