Hacker News new | ask | show | jobs
by LaMarseillaise 2429 days ago
If taking the "captured" interpretation, I think it could be reasonably inferred that they successfully landed the aircraft at an airfield afterwards (same meaning). This was my initial read of it and it does not seem strange to me on reflection.

I would like also to point out that even if we do interpret the second as meaning "destroyed", the first could then be interpreted as a combat aviator shooting down an opposing aircraft, bringing us back to the same meaning. Or perhaps both of my interpretations are correct and the meanings are different...

What this tells me is that the benchmark is not very useful.

2 comments

Landed in the sense of a fisherman landing a marlin.
So at the end of the process they were in possession of the enemy aircraft. Maybe they jumped across in mid-air and wrestled it off the other pilot.
The benchmark is useful primarily because it puts humans and computers on a level playing field. Human readers will misinterpret written language, and human writers will poorly represent concepts.

The propensity to make mistakes in comprehension is unavoidable, humans only approach 90% accuracy, and computers are getting close to the same level of accuracy on the same base materials as humans.

The other way of testing would be to devise a test where there is only a single interpretation, where the context is clear, and there is no ambiguity in meaning. In that case a competent human and computer algorithm could be expected to answer all questions perfectly.

The purpose of this benchmark on the other hand is to test comprehension when meaning is not explicit and context clues are implied, something humans have had the advantage at over computers until quite recently. The computer won't be 100% accurate, but that's not the purpose of this test.