|
|
|
|
|
by YeGoblynQueenne
2432 days ago
|
|
I think that the point about the majority of tests being multiple-choice is the most important one to underline. Structuring a problem as a multiple choice task is basically turning it into a classification problem, but it doesn't really answer the question everyone wants answered: is it really possible to reduce the problem of language understanding to classification? i.e. is it really possible to understand human language with no other ability than the ability to identify the classes of objects? But that is a question that has to be answered before any performance on benchmarks that reduce language understanding to classification can be appraised correctly. If accurate classification is not sufficient for language understanding, then beating benchmarks like SuperGLUE tells us nothing new (we already know we have good classifiers). The problem here is that we have no good measures of language understanding, of humans or machines- because we have a poor, er, understanding of our own language ability. Until we know more about what it means to understand language it won't be possible to evaluate automated language understanding systems very well. Hopefully though, the skepticism I've observed around results like the one above, will lead to a renewed effort to research our language ability, and perhaps our intelligence in general. |
|