Hacker News new | ask | show | jobs
by ondrasej 5804 days ago
There are some well known public data sets used for this purpose, such as those in the UCI Machine Learning repository. Unfortunately, not everyone is using them. And even if they do, it is often impossible to reproduce the results as pre-processing of the data is not described well enough in the paper, or because the authors add random components (such as costs) to the data without describing the distributions properly.

Publishing scripts for the complete workflow starting with the raw data and printing the table with the results in the end would be the best. But I've seen academics working in a way that is completely orhogonal to this - copying & pasting data to Excel or Matlab (or even re-typing them) and doing the analysis by hand in the GUI... I don't have any doubts they would be able to learn how to write the script, but I'm very sure they would put up heavy resistance to do so.