Hacker News new | ask | show | jobs
by roel_v 1956 days ago
Are you a scientist? Because it really doesn't. For example, in most papers that I think harder than normal about, I don't care so much about 'the data' as I do about a description of what this 'data' is actually meant to represent, and how it was collected and processed. (I'm talking here about things that are a bit more complicated than e.g. railfall measurements or any other such lab-like, STEM topics) E.g., I do quite a bit of population modeling. You would think that 'population' is relatively easy to quantify, but it really isn't, and I can talk for hours about how cavalier people throw their 'population data' numbers into models and make all sorts of conclusions based on objectively wrong interpretations what this 'data' is.

If the vast majority of papers can't even get that right, I don't care as much about the HN idea of what 'reproducibility' is - i.e. check if 'git clone <xyz> && run_model.sh && run_tests.sh' says 'All OK!' at the end.

1 comments

I agree in part here, but think it doesn’t apply to all papers. Those wrong interpretations you mention are generally because statistical inference is hard, and it’s an unfortunate reality that a lot of scientists are bad statisticians.

There’s an article on the front page of HN about a nature retraction that came from someone asking authors of a paper for their data. That tells me that the data can be useful for fixing suspicions results. But also that simply asking for it can be sufficient. I wonder how often authors say no to data requests. My guess is only in the fishiest cases and in old cases. Publishing code and data isn’t high up the priority list, but in a lot of studies, it’s simple enough to be good ROI.