|
|
|
|
|
by bumby
533 days ago
|
|
It seems you are conflating concepts, maybe because you take it personally which it shouldn’t be. The process can be broken, but that doesn’t mean the academic is bad, just that they are part of a broken process. Likewise if a scrum is a broken process, it will lead to bad results. If it isn’t “done properly” then we seem to be saying the same thing: the process isn’t working. As I and others have said, there are some misaligned incentives which can lead to a broken process. Just because it sometimes works doesn’t mean it’s a good process, anymore than a broken clock is still correct twice a day. It varies by discipline, but there seems to be quite a few domains where there is actually more bad publications than good. That signals a bad process. As others have talked about here, sometimes it becomes impossible to replicate the results. Is it because of some error in the replication process, the data, the practioner, or is the original a sham? It's hard to deduce when there's a lot you can't chase down. I also think you are applying an overly superficial rationalization as to why sharing code would amplify the replication issue. This is only true if people mindlessly re-run the code. The point of sharing it is so the code can be interrogated to see if there are quality issues. Your same argument could be made for sharing data; if people just blindly accept the data the replication issue would amplify. Yet we know that sharing the data is what led to uncovering some of the biggest issues in replication, and I don’t see many people defending hiding data as a contradiction in the publication process. I suspect it’s for the reasons others have already eluded to in this thread. |
|
Also, let's not mix up "peer review" or "code sharing" and "bad publication" or "replication crisis".
I know people outside of science don't realise that, but publishing is only a very small element amongst the full science process. Scientists are talking together, exchanging all the time, at conferences, at workshops, ... This idea that a bad publication is fooling the domain experts does not correspond to reality. I can easily find a research paper mill and publish my made-up paper, but this would be 100% ignored by domain experts. Maybe one or two will have a look at the article, just in case, but it is totally wild to think that domain experts just randomly give a lot of credit to random unknown people rather than working with the groups of peers that they know well enough to know they are reliable. So, the percentage of "bad paper" is not a good metric: the percentage of bad papers is not at all representative of the percentage of bad papers that made it to the domain experts.
You seem to not understand the "replication crisis". The replication does not happens because the replicators are bad or the initial authors are cheating. There is a lot of causes, from the fact that science happens to the technology edge and that the technology edge is more tricky to reach, that the number of publications has increased a lot, that there is more and more economical interest trying to bias the system, to the stupid "publish or perish" + "publish only the good result" that everyone in the academic sector agree is stupid but exist because of non-academic people. If you publish scientifically interesting result that says "we have explored this way but found nothing", you have a lot of pressure from the non-academic people who are stupid enough to say that you have wasted money.
You seems to say "I saw a broken clock once, so it means that all clocks are broken and if you pretend it is not the case, it is just because a broken clock is still correct twice a day".
> This is only true if people mindlessly re-run the code. The point of sharing it is so the code can be interrogated to see if there are quality issues.
"Mindlessly re-running the code" is one extreme. "reviewing the code perfectly" is another one. Then there are all the scenario in the middle from "reviewing almost perfectly" to "reviewing superficially but having a false feeling of security". Something very interesting to mention is that in good practices, code review is part of software development, and yet, it does not mean that software have 0 bugs. Sure, it helps, and sharing the code will help too (I've said that already), but the question is "does it help more than the problem it may create". That's my point in this discussion: too many people here just don't understand that sharing the code create biases.
> Yet we know that sharing the data is what led to uncovering some of the biggest issues in replication,
What? What are your example of "replication crisis" where the problem "uncovered" by sharing the data? Do you mix up "replication crisis" and "fraud"? Even for "fraud", sharing the data is not really the solution, people who are caught are just being reckless and they could have easily faked their data in more subtle ways. On top of that, rerunning on the same data does not help if the conclusion is incorrect because of a statistical fluctuation in the data (at 95% confidence level, 5% of the paper can be wrong while they have 0 bugs, the data is indeed telling them that the most sensible conclusion is the one they have reached, and yet these conclusions are incorrect). On the other hand, rerunning on independent data is ALWAYS exposing a fraudster.
> and I don’t see many people defending hiding data as a contradiction in the publication process.
What do you mean? At CERN, sharing the data of your newly published paper with another collaboration is strictly forbidden. Only specific samples are allowed to be shared, after a lengthy approval procedure. But the point is that a paper should provide enough information that you don't need the data to discover if the methodology is sound or not.