Hacker News new | ask | show | jobs
by jaibot 3154 days ago
It's training data. There's 17 suicidal and 17 non-suicidal scans, for a total of 34 scans. They trained 34 models, leaving one scan out each time. Of those 34 models, 31 correctly predicted the left-out scan.

IANAStatistician, but this seems like a trash result.

2 comments

Cross validation is ok if you do it once, but they repeatedly did it and chose the features based on the results. You can't keep adjusting your model/features based on cross validation performance without overfitting to the training data.
In this case nested cross-validation would have been the proper way to do this. Run your entire model selection process (scaling - feature selection w/ CV - model selection - hyper paramter tuning w/ CV) on each of the folds in the outter CV loop. That will tell you how good your process is at building a model that generalizes.
How did they adjust the model/features based on CV performance? It looks to me like they did LOOCV.
Read the second paragraph I quoted above:

"The features used by the classifier to characterize a participant consisted of a vector of activation levels for several (discriminating) concepts in a set of (discriminating) brain locations. To determine how many and which concepts were most discriminating between ideators and controls, a reiterative procedure analogous to stepwise regression was used, first finding the single most discriminating concept and then the second most discriminating concept, reiterating until the next step reduced the accuracy. A similar procedure was used to determine the most discriminating locations (clusters)."

The features were chosen using the same data as used to assess predictive skill.

That quote does not support your summary, unless you are basing it on the information not explicitly mentioned. (I.e. they didn't say that they were only using training data to select features, but if they are any competent, they did.)
See the last part of this post: https://news.ycombinator.com/item?id=15598117

Can you provide pseudocode consistent with what they described (in the post you responding to) that wouldn't lead to leakage? I can't see it.

Select a training set, leaving out one sample for validation. For all features, train a classifier on the training set using that feature. Keep the one that gives the highest discrimination score on the training set. Repeat with more features. Then evaluate the final classifier on the validation sample, which has so far not been seen in any of the steps. The result provides an estimate of the risk on unseen data from the same distribution.

To get the estimation variance down, you can repeat this for all possible choices of validation sample. That means, you start the feature selection process on the new training set over from scratch and obtain another risk estimate. If they kept the features selected earlier, that estimate would be "contaminated" and not independent, but if they correctly start over, the procedure is valid.

How so? Isn't there a 50% chance of getting it right by pure chance, but they got it right 91% of the time instead?