| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 6gvONxR4sf7o 2435 days ago
	One thing to always point out in these cases is that the human baseline isn't "how well people do at this task," like it's often hyped to be. It's "how well does a person quickly and repetitively doing this do, on average." The 'quickly and repetitively' part is important because we all make more boneheaded errors in this scenario. The 'on average' part is important because the errors the algo makes aren't just fewer than people, they're different. The algos often still get certain things wrong that humans almost never would. This is really really super great, let's be clear. It's just not up to the hype "omg super human" usually gets.

4 comments

TheOtherHobbes 2435 days ago

It seems to mean "How well does Mechanical Turk do the task?" which is a separate thing again. And yes - error type is at least as revealing as error frequency.

I have no idea where the real human baseline is, or how to find it.

Also, consider this discussion. GLUE winners may be able to make informed parsing guesses about single text blocks, but they're years away from being able to make a useful contribution to a discussion like this one.

link

IshKebab 2435 days ago

Regarding the type of errors, it seems like the benchmark should be able to take that into account. That is, get a load of humans to do the task on the same specific examples, then for each example you know how hard it is, and what acceptable answers are (I bet a lot of the ground truth is wrong or ambiguous).

Then you can benchmark your AI but penalise it more heavily for getting things wrong that are obvious to a human.

link

6gvONxR4sf7o 2435 days ago

That would be ideal, if money weren't a factor. Since money is a factor, I wonder what the tradeoff is between labelling each instance N more times versus just getting N times more instances labeled.

link

Pahr3yah 2435 days ago

In the context of GPT2 someone coined the expression "Humans Who Are Not Concentrating Are Not General Intelligences"

link

The_Amp_Walrus 2435 days ago

I think it was this blogger: https://www.google.com/amp/s/srconstantin.wordpress.com/2019...

link

jcims 2435 days ago

Great point! It makes sense in the context of what these algorithms would generally be tasked with.

link