| HN Mirror

The benchmark is useful primarily because it puts humans and computers on a level playing field. Human readers will misinterpret written language, and human writers will poorly represent concepts.

The propensity to make mistakes in comprehension is unavoidable, humans only approach 90% accuracy, and computers are getting close to the same level of accuracy on the same base materials as humans.

The other way of testing would be to devise a test where there is only a single interpretation, where the context is clear, and there is no ambiguity in meaning. In that case a competent human and computer algorithm could be expected to answer all questions perfectly.

The purpose of this benchmark on the other hand is to test comprehension when meaning is not explicit and context clues are implied, something humans have had the advantage at over computers until quite recently. The computer won't be 100% accurate, but that's not the purpose of this test.