|
|
|
|
|
by rahulnair23
1804 days ago
|
|
From the full paper[1]: > All models were trained on 70% of the data and tested on the remaining 30% of the data. Note that each month record for each participant was considered an independent data point. I'm almost certain that this is a mining leak. Data from one patient will end up both in the training and test set and result in fantastic accuracy. Of course it will be from different months. The correct way to do this is to cross-validate/split across the population. It seems unlikely, from this description, that the authors have done so. [1] https://alzres.biomedcentral.com/articles/10.1186/s13195-021... |
|
It's a very common error and it should be easy to catch but....I've even seen a study that treated individual slices of an MRI as independent, which is laughably wrong.
I think part of the problem is that the "analysts" are increasingly uninvolved in the data collection, and just treat it as a tuple of (X, y). If you thought about what they mean, even for a second, ("Oh, Mr. Smith is always an awful driver"), the problem is obvious.