Hacker News new | ask | show | jobs
by b409ba0801cd21 1645 days ago
I wonder if there have been any efforts to sabotage crowdsourced AI training and content moderation by signing up on crowdworking platforms and intentionally providing false responses. A large and tech savvy enough sabotage ring could use a browser extension or the like to keep their responses straight and increase the odds of their fake answers being accepted.
5 comments

It is pretty common to "verify" workers: a fraction of questions (often 1% to 10%) is asked-before questions with known correct answers. If those are not answered correctly, the entire dataset from this person is ignored. Depending on the platform, they might get paid less as well.

This is designed to detect workers who either did not understand the instructions, or those who don't care about those and answer randomly. But this works against intentional sabotage as well.

Speaking from my experience working at data labeling companies, the sabotage does occurs, but is not intentionally malicious.

What ends up happening is that some labelers learn what the pre-determined questions and answers are and share these via Facebook and Discord to other labelers. That way, the other labelers can stay on the task longer while providing garbage responses to the non-predetermined question/answer pairs.

It's an arms race with labelers on one end, trying to make a quick buck, and data labeling platforms on the other, trying to get quality labeled data.

It was tried. 4chan tried a coordinated “penis” prank on Recaptcha. Despite the much vaunted community power of the website, and despite being coordinated, nothing happened.

It turns out that they are a drop in the bucket. Not only is there low RoI but also the group is too weak.

Did anything ever come of that? Because nowadays, those captchas are no longer used.
Seems like a lot of tedious work at low pay for little impact, particularly for people who are tech savvy enough.