Hacker News new | ask | show | jobs
by Fede_V 4183 days ago
Try to reproduce the analysis published in a paper when all you have is a matlab script with one letter variable names and zero comments :)
3 comments

If you're running someone else's code, imo that's not reproduction in the first place, just like re-running an experiment using the original experimenter's preparations and lab apparatus is not what's usually meant by "reproducing" an experiment. Too much undocumented stuff can creep in if you don't independently reproduce, with independent apparatus, preparations of samples, etc. (I don't think having someone's code is useless, and it can be especially useful for elaborating on the original experiment, but I would purposely avoid looking at it if I were aiming for an independent reproduction.)
You are always running someone else's code. It starts the moment you boot up your machine.
Not if you bootstrap from the silicon up.
If you even have access to the source code, detailed algorithm, or even a matlab script. It's either a citation or a plain old equation.

Often times, and especially from what I've seen in the computer vision papers, the authors merely state what algorithm they used, and how they combined it with their novel method. And that algorithm is in another paper, by the way, probably by the same author. Definitely not the implementation you're working with, too, if you have it.

It's almost as if they need a combined repository. And each paper that presents a novel algorithm, or implementation of an existing one, is a "changeset" or "branch". And the citations to algorithm's used in a paper would be changeset hashes, or branch names. Hey, it's the first thing that popped into mind for me to solve this horrendous problem.

I certainly agree with this. The computer vision field is awash with papers proposing a 'new' algorithm which is then poorly compared to some select group of existing techniques under criteria chosen by the author. A paper is a very poor substitute for the code itself and really it should be mandatory for code to be submitted with the paper, especially in a field such a computer vision where the entire experimental apparatus could be packed into a zip file. That way any other group could take the code and independently evaluate the technique without reimplementation. Indeed my own experience is that often the maths described in the paper is not necessarily responsible for all the results! As you say this could even become the start of collaborative improvement.

Unfortunately my experience is that too many academic groups believe that their source code is the route to untold riches.

Better than nothing. (Been there, done that).