Hacker News new | ask | show | jobs
by TheOtherHobbes 2428 days ago
It seems to mean "How well does Mechanical Turk do the task?" which is a separate thing again. And yes - error type is at least as revealing as error frequency.

I have no idea where the real human baseline is, or how to find it.

Also, consider this discussion. GLUE winners may be able to make informed parsing guesses about single text blocks, but they're years away from being able to make a useful contribution to a discussion like this one.