Hacker News new | ask | show | jobs
by ehsankia 2432 days ago
As long as they're not training on the test data, and they're not submitting hundreds of submissions tweaking parameters trying to improve their score, I don't see what the problem is. If the algorithm can do a great job at classifying hundreds of new test cases it has never seen, and it isn't over-fitted, then that means it is good at that specific task. Of course the task itself may or may not be useful, and you can have some meta discussion about what "understanding language" is, but the computer definitely is doing a super human job at that given task.
1 comments

Maybe it's over-fitted on the new data. There has to be a constant infusion of new training data and a system can only prove itself over time.

These rankings, if real, should be in constant flux.