Hacker News new | ask | show | jobs
by astrosi 3952 days ago
Likewise with the "predictions" of the authors likes/dislikes. Testing how the model will perform on an independent data-set (or at least cross validation [1]) would be much more interesting.

[1] https://en.wikipedia.org/wiki/Cross-validation_(statistics)

2 comments

The other thing I wondered about the predictions: she apparently rated all of the dresses, and the top/bottom matched the ratings. Fair enough. But what about the residuals, the missclassified ones - the ones where the logistic regression predicts a high or low score and her rating was actually the opposite? That might be interesting to look at.
That's there. Search for:

> The misclassifications are interesting too

One problem seems to be that it concluded she'd dislike anything the exact opposite color from her favorite shade of red. A common flaw in linear models.

The blog post seems to be getting modified at this moment. When I first saw it, it didn't have anything about the misclassifications, but that has been added now.
I've only glanced at her code, but it looks like[1] the predictions are from held-out data.

EDIT: All of the data was used in forming the PCA basis, but that isn't (necessarily) an error, depending on the use-case. And the logistic regression model was evaluated on held-out data.

[1]https://github.com/graceavery/Eigenstyle/blob/master/visuals...