Well... there's computer science, and then there's computer science.
All (most of?) the stuff done on P vs NP is good science that will stand up.
Studies on which language features make it better for developers are social science, because they involve those pesky humans. That stuff is likely to suffer from a reproducibility crisis.
reproducibility in social sciences is usually a function of a)insurmountable costs of recruiting participants, b)complexity of the questions, and c) lack of standardization of humans. Sure social scientists would like to have N=1 billion, but they'd be lucky to get funding for 1,000.
can't win. If your sample isn't diverse enough it isn't representative. It it is representative, its likely too small to get a reliable effect due to the number of confounds.