|
|
|
|
|
by Odenwaelder
3541 days ago
|
|
It's not only about fake data, it's also, maybe even most importantly, the low quality data. Publications with low data quality decrease the signal-to-noise ratio, leading to a slower scientific progress. Purely from a scientific standpoint: If you identify fake data, you just take them out of your model training set. But identifying low quality data is much harder. |
|
A couple of years ago, as this story started to break, I would mention it to scientist friends. "Don't worry," they'd say, "We always have the meta analysis to fall back on."
I don't hear that anymore.
ADD: A critical issue here, which you touch on tangentially, is the mix of motivation and milieu. If I'm making stuff up -- bad data -- other people can identify it and delete it. But if my goal is to appear to be doing good science instead of actually doing it, then it can become extremely problematic both to identify and remedy what I've done.
A lot of "normal" science is done around the margins, with not-so-incredible hypotheses and pedestrian-looking data. Bad science with really poor data fits seamlessly into that model without having any distinguishing characteristics.