Hacker News new | ask | show | jobs
by underanalyzer 1729 days ago
I have nothing to do with machine learning but it seems like the hashing approach would only work if you are “training” on the evaluation set instead of a separate training set. Afaik in image net like challenges the set of labeled training images does not contain any of the evaluation images so there wouldn’t be any hashes matching any of the evaluation data.
1 comments

Yes, you're right. You should never see the test/evaluation dataset during training so it would be impossible to "memorize" the test cases. You would get good near perfect accuracy on the training data, but not the test set. I think the closest analogue would be models that produce conceptual embeddings somewhere in them -- those are kind of like hashes with the property that similar things have similar embeddings. Many classification neural networks kind of operate like that -- the initial layers produce a representation of the data and then the final layer actually performs the classification.