Hacker News new | ask | show | jobs
by izacus 1951 days ago
I spent a lot of time in my career implementing and comparing different algorithms based on published papers. This might surprise you but:

- Most of them don't publish any kind of code or only snippets.

- Most of the algorithms are incomplete (think "we add constant M here into equation tuned by an expert"). The chosen constants aren't documented and they're tweaked before publication to maximize results for the given dataset.

- Most results are only good for the chosen dataset and based on very fine tuned constants (which is the reason they're not published). As soon as you apply them to a slightly different dataset they fall apart.

- Even if you get code for a given paper, it's usually a disatrous mess of quality and runs only on the given researchers computer with given version of Windows and a weird patched version of Python they found somewhere in the internet.

There ARE better written papers out there. But most of them are just made to publish "something" and get the publishing metrics up. That means minimum effort for actual research and maximum effort to tweak and tune results to make them look good.

1 comments

It's tragic that academics have no metric for quality in published work.

The only metrics are journal status and citation count, which is partly why we're in this mess.

There should be some form of public post-review attached to all public work, with explicit requirements for rigour and reproducibility.

Some domains - like math - do at least attempt this, more or less. But CS doesn't seem to.

Aside from motivations - now skewed towards business outcomes - the underlying problem is that paper publishing as the gold standard for disseminating new research is a 17th century process and badly needs an update.