Hacker News new | ask | show | jobs
by Odenwaelder 3541 days ago
It's not only about fake data, it's also, maybe even most importantly, the low quality data. Publications with low data quality decrease the signal-to-noise ratio, leading to a slower scientific progress. Purely from a scientific standpoint: If you identify fake data, you just take them out of your model training set. But identifying low quality data is much harder.
2 comments

Agreed. Poor quality data is worse.

A couple of years ago, as this story started to break, I would mention it to scientist friends. "Don't worry," they'd say, "We always have the meta analysis to fall back on."

I don't hear that anymore.

ADD: A critical issue here, which you touch on tangentially, is the mix of motivation and milieu. If I'm making stuff up -- bad data -- other people can identify it and delete it. But if my goal is to appear to be doing good science instead of actually doing it, then it can become extremely problematic both to identify and remedy what I've done.

A lot of "normal" science is done around the margins, with not-so-incredible hypotheses and pedestrian-looking data. Bad science with really poor data fits seamlessly into that model without having any distinguishing characteristics.

It's reminiscent of the role played by the Sophons in Cixin Liu's The Three-Body Problem, except in this case it's self-inflicted.

https://en.wikipedia.org/wiki/The_Three-Body_Problem#First_C...