| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lusus_naturae 890 days ago
	> Anyone who's advised students or asked even presenting researchers such questions know that often people will literally not know what happened to all their data. I am sorry that’s been your experience, maybe it varies by field and quality of research? Most people I’ve questioned have provided reasonable answers to their findings. I don’t understand why anything needs to be assumed in bad faith or shoddily done. It’s all a bit Dunning-Kruger to me where everyone assumes that everyone else is doing shoddy or bad work.

2 comments

pocketsand 890 days ago

Things are complicated.

To be fair: Everyone I've worked closely with in research has gone above and beyond not to cut corners and produce high quality data and research.

What I have in mind here is a situation where people are actually quite careful but can still end up in a place where they don't know what happened because they don't have good systems for creating datasets and storing code.

For example, graduate students are not always taught to work in a reproducible way. It's definitely gotten better from what I can see, but it was normal for people to get source data and work that data into its final form in a lot of different steps, but not always reproducible steps. E.g., data comes in from secondary source or other provider. It gets cleaned. That file gets saved as something like "clean data 011234.csv".

More work is done, it gets saved again.

Time passes, things are revisited, and a handful of files exist that likely with some care could lead from point A to point B. But the exact process, to say nothing of the dozens, sometimes hundreds, of small decisions data preparation decisions get lost to memory.

Code doesn't go in version control. People get new computers. USBs get lost. Universities migrate to new data systems and so on.

All the while, these students and researchers were very careful while doing the work. They were just never trained to use good version control and pipeline processes. They basically do what they did with papers they write. Save and backup while working through the paper and move on when it's done.

This is made worse when data is proprietary or not legally shareable.

So people aren't necessarily being shoddy or doing bad work, they're just not using good systems.

link

lusus_naturae 890 days ago

> So people aren't necessarily being shoddy or doing bad work, they're just not using good systems.

Agreed. I think there isn’t an incentive to do this because reproducibility takes a back seat to so many other concerns. Unless PIs are told that their publication chances depend on reproducibility, this isn’t going to change.

link

robocat 890 days ago

> It’s all a bit Dunning-Kruger to me

Fantastic article on the reproducability of Dunning-Kruger effect: https://replicationindex.com/2020/09/13/the-dunning-kruger-e...

Also see: "The Dunning-Kruger Effect is Autocorrelation" https://economicsfromthetopdown.com/2022/04/08/the-dunning-k...

Lovely quote:

  “These responses to our work have also furnished us moments of delicious irony, in that each critique makes the basic claim that our account of the data displays an incompetence that we somehow were ignorant of.” (Dunning, 2011, p. 247).

Of course Dunning-Kruger is self-referential. Any mention of Dunning-Kruger automatically makes you a victim on wrong side of the graph.

link

lusus_naturae 890 days ago

I think the term is still useful for giving words to a phenomenon people experience. I guess a less charitable and more presumptuous term of such behavior would be calling the person displaying it narcissistic.

link