Hacker News new | ask | show | jobs
by mannykannot 2428 days ago
From the abstract of the associated paper: "performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research."

It occured to me that hn_throwaway_99's question, and the responses to it, is the sort of dialog in which one could find additional headroom for further research into natural language understanding. We can understand, for example, that while the two uses of 'landed' are different, they are not completely unrelated, and we can explain how they are related, for example by introducing a third construct, 'landed a fish', as a couple of replies have done.

1 comments

Limited headroom? Seems like they're assuming greater-than-human language ability is just impossible and will never be surpassed.
I'd argue that greater-tham-human language ability is by definition useless.

Language is specifically a human communication tool, there's no value in surpassing the language skill that humans have, if indeed such a thing is even meaningful (what does it mean to be better than the best* French person at French?)

* By whatever language-related metric

I disagree, greater-than-human-average is not useless. There's a lot of room for misinterpretation in human language. We compensate for that by non-verbal communication (posture, expression) or by asking for clarification. On top of that, most places have local expressions or idioms that are not necessarily globally recognized.

So there's two ways in which a language automaton must be better than human: it cannot rely on non-verbal hints nor can it easily ask for clarification, and it must be able to interpret many different dialects and idioms correctly -- many more than an average human would need to.

I do not think this result is that close to a greater-than-human language ability in general, and I do not think they are claiming it. I think the point is that, with scores on this test closely approaching average human scores, there is not much headroom for this particular test to drive, or measure, further progress.
It's a reasonable assumption if only for the simple reason that humans said the sentences being tested, so how would you surpass that?
You create a new test designed by your newly better-than-human language experts.