Hacker News new | ask | show | jobs
by tpeo 2908 days ago
If the statistical significance of your results is algorithm-dependent, shouldn't they be regarded as suspect? Perhaps it might be just a failure of imagination on my part, but I find it odd to think that changing a software package might budge estimates far enough to push them outside the zone of statistical significance unless they were only marginally significant in the first place.
2 comments

I could imagine algorithmic differences adding bias in error margins. Both versions might be accurate approximations of the answer, but one might lean towards one end of the error space and one might lean towards the other.

It's like when fixing a bug in library code breaks application code. Usually it's because there was some undefined behavior in the library - which wasn't part of the contract - which the application (knowingly or unknowingly) relied upon, and then the updated version produces a different undefined behavior.

> If the statistical significance of your results is algorithm-dependent, shouldn't they be regarded as suspect?

It's a fair question.

Here's a possible recipe to get such variations in estimates: (1) an estimator that does not really match the distribution of the dependent variable; (2) small sample sizes with insufficiently well-handled influential observations; (3) (robust) standard error corrections leading to disproportionate confidence intervals; and (4) limited work on diagnostics, which is another way to make all of the previous points.

The points above can be used to 'take down' many papers published in journals like the one in which the incident happened, but you can also take a more charitable view and rescue most of those papers by claiming, for instance, that statistical significance should not govern (and even less govern alone) over the identification of the data generation process.

My conclusion is therefore: yes, algorithmic variation makes those results suspect, but on a single dimension that should probably not stand as the most important one in assessing those results in the first place.

Edited: syntax.