Hacker News new | ask | show | jobs
by jjoonathan 3154 days ago
How did they adjust the model/features based on CV performance? It looks to me like they did LOOCV.
1 comments

Read the second paragraph I quoted above:

"The features used by the classifier to characterize a participant consisted of a vector of activation levels for several (discriminating) concepts in a set of (discriminating) brain locations. To determine how many and which concepts were most discriminating between ideators and controls, a reiterative procedure analogous to stepwise regression was used, first finding the single most discriminating concept and then the second most discriminating concept, reiterating until the next step reduced the accuracy. A similar procedure was used to determine the most discriminating locations (clusters)."

The features were chosen using the same data as used to assess predictive skill.

That quote does not support your summary, unless you are basing it on the information not explicitly mentioned. (I.e. they didn't say that they were only using training data to select features, but if they are any competent, they did.)
See the last part of this post: https://news.ycombinator.com/item?id=15598117

Can you provide pseudocode consistent with what they described (in the post you responding to) that wouldn't lead to leakage? I can't see it.

Select a training set, leaving out one sample for validation. For all features, train a classifier on the training set using that feature. Keep the one that gives the highest discrimination score on the training set. Repeat with more features. Then evaluate the final classifier on the validation sample, which has so far not been seen in any of the steps. The result provides an estimate of the risk on unseen data from the same distribution.

To get the estimation variance down, you can repeat this for all possible choices of validation sample. That means, you start the feature selection process on the new training set over from scratch and obtain another risk estimate. If they kept the features selected earlier, that estimate would be "contaminated" and not independent, but if they correctly start over, the procedure is valid.

My understanding is you are saying create N (N=34 in this case) different parallel models that use different features/etc. Then take the average (or whatever summary stat) of the accuracies to get the predictive skill.

When we want to use these models, we run new/test data through all N=34 models in parallel and calculate a prediction from each. Then somehow these predictions need to be combined (one again an average, etc). This is the average of the predictions, not accuracies/whatever.

Where was the step combining these predictions present during the training? It seems your scheme necessarily calculates an accuracy based on a different process than needs to be applied to new data.

No, when you want to classify a new sample, you take a model trained on the complete labeled data you have and use the prediction of that. The validation procedure using those 34 models trained on subsets of the data is just to tell you how accurate you should expect the result to be. Afterwards, you can throw those models away.

Of course you could build an ensemble model, but if you want to know the expected accuracy of doing that, you need to include the ensemble-building into your validation procedure. (Or use some theorem that lets you estimate the ensemble performance from that of individual models, if that is possible.)

>"when you want to classify a new sample, you take a model trained on the complete labeled data you have and use the prediction of that."

Using which set of features? You have 34 different models with different features...

You run the whole training process on the complete data. Including feature selection.