| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by IshKebab 2428 days ago
	Regarding the type of errors, it seems like the benchmark should be able to take that into account. That is, get a load of humans to do the task on the same specific examples, then for each example you know how hard it is, and what acceptable answers are (I bet a lot of the ground truth is wrong or ambiguous). Then you can benchmark your AI but penalise it more heavily for getting things wrong that are obvious to a human.

1 comments

6gvONxR4sf7o 2428 days ago

That would be ideal, if money weren't a factor. Since money is a factor, I wonder what the tradeoff is between labelling each instance N more times versus just getting N times more instances labeled.

link