|
|
|
|
|
by ipsa
2426 days ago
|
|
Many statistical assumptions are regurarly broken, for pragmatic reasons (it just works better), or because the world is not static (and so the IID assumption is broken). There is an entire subfield of learning on imbalanced datasets, which includes resampling, subsampling, oversampling, and algorithms like SMOTE. It is common to use these techniques to get a better performance, including on unseen out-of-distribution data. Fraud - and CTR - and medical diagnosis models are regurarly rebalanced for other purposes than trying to break assumptions or cheat oneself into a seemingly higher accuracy. Plus, the signal does not dissapear when training only on originally balanced data. These systems do not work by the grace of a rebalancing trick alone, but they may work better (as usually the case with neural nets, which do not even give convergence guarantees: something only a statistician would worry about). You can switch negative with positive class and my point remains: if the authors wanted the fraudulenty hack the accuracy score, this is way easier with imbalanced data. AUC metric robust to class imbalance anyway: ranking won't change for unseen data out of distribution, you can just adjust the threshold to match it. I'd say an academic source is necessary in this case, because you implicitly accuse these scientists of doing shoddy hyped up work, with fudging tricks to appear more accurate. I need more than popular media sources or previous HN discussions to admit this paper was "widely discredited". Your Yuri Geller example is a red herring: one is a stage magician, the other is peer-reviewed science. But to oblige: https://scholar.google.com/scholar?q="yuri+geller" |
|
In particular about the gaydar paper, the authors cook up their data to get good results and then use those results to claim that they have found evidence for an actual natural phenomenon (hormones influencing haircuts etc). That's just ...pseudoscience.
Is your google scholar link humour?