|
That is correct. Even when reviewing papers, it is true. As a reviewer, you do not question, for instance, whether the authors did the experiments exactly as stated, or whether they tried to analyze the results 20 different ways until they found the way that looked best. You take it on trust that they did these things correctly, and focus on whether their conclusions are justified from their data. If reviewers take so much on trust, how much more so readers, then? There are very, very few actual standards of technical proof that are in play here. Particularly when a paper says something that isn't particularly novel. If the results match "prior probabilities" from the literature, the paper will be believed without much question. If it doesn't, it will get more scrutiny. People quickly learn that it is easier to publish when your result fits the status quo. Take a look at slide 10 of [1]. There are numerous examples like this where physical constants were first measured as a value, and gradually trended up over a long period of time towards today's "true" value. If the experiments were really independent, they would, generally, scatter randomly before converging on the true value. The fact that they did not, suggests investigators were using methods to "smooth" the difference between their data and prior findings. And thus, we get self-perpetuating cycles of groupthink. And that, in turn, is why supposedly independent experiments cannot so easily be taken as independent points of evidence for an overall hypothesis. [1] https://www.pas.rochester.edu/~sybenzvi/courses/phy403/2015s... |
The maximally pessimistic view, which you and Feymnan seem to be espousing, is that people explicitly put "put their thumb on the scale" so that they get the right number. That's clearly bad.
The PDF you linked presents it as a more emergent phenomenon, driven by how people usually work. It's possibly an argument for working more slowly and carefully, or the use of pre-registration, but it seems ethically neutral.
Finally, you could think about this as a form of Bayesian updating, which each experiment nudges our previous best estimate of the value. Obviously, it would be better to do this formally, but it does seem more rational than completely discarding the past.