Hacker News new | ask | show | jobs
by pca006132 1424 days ago
While independent verification and reproduction is hard, I wonder if there is any requirement for researchers to at least publish their data set for statistical analysis and further research.

Also, I found it interesting that even though computer science research are usually easier to reproduce, a lot of journals and conferences do not mandate artifact evaluation, this is just considered nice to have for submission. If we can have mandatory artifact evaluation, even something not reusable and can just repeat the experiment in the paper, it will be much easier to verify the claims in the papers and compare different approaches.

1 comments

> I wonder if there is any requirement for researchers to at least publish their data set for statistical analysis and further research.

Not generally, though the tide is slowly turning in the right direction. Unfortunately many laws/policies pushing for openness and transparency in research are sidestepped with the classic "data available upon request," a.k.a. "I promise I'll share the Excel files if you email me" (they will not).

I don't understand why can they use this as an excuse. If they can share the data upon request, why can't they just publish that as well? Is that related to some legal/privacy issue?
> Is that related to some legal/privacy issue?

Possibly in some medical or social science fields, I don't know. I know there is not such an issue in chemistry and materials science. There also may be some complications for collaborations with industry, but that's kinda a different situation. For people whose career development is not strongly tied to reproducibility of their work (a.k.a. everybody) it's just another step in the overly complex process of publishing in for-profit journals. Funding agencies generally aren't going to punish people for using this excuse and the watchdogs/groups concerned with reproducibility have no teeth.

Not an excuse, but journals don't make it easy to share files, as hard as that is to believe. Some will only take PDFs for supplemental information and many have garbage UIs, stupidly small file size limits, etc. Just uploading to a repo (or tagged release) on GitHub is common these days because there is much less friction.