| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by webmaven 1732 days ago

> You can submit known correct answers for questions, and those questions are then used as ground truth to score worker accuracy. Workers are then scored on their similarity to known correct answers and other workers that have accurately answered questions. It works surprisingly well for how simple it is.

Have you noticed problems that show up with questions whose answers have a bimodal distribution (ie. The gold standard question actually has two or more correct answers)?

In one sense, this is just a labeling quality problem with the 'gold standard' data, but to a lesser extent these same issues may crop up in the data being labeled when using similarity or clustering to rate or classify the workers and transitively apply that to the other results they produce.

1 comments

kuzee 1730 days ago

Anecdotally yes it's a problem if two classes (button choices) are similar, resulting in two "top answers" for a given task. This seems most common for "yes/no" task types where there are only two options, and distinguishing between them is the hard part.

I haven't dug into the data on this across the platform but you've given me the idea to go see if I can find evidence of this, and see if I can improve somehow. There's only low hundreds of projects, so I might be able to find some that have this problem.