Hacker News new | ask | show | jobs
by teilo 2688 days ago
Those images are infuriating!

Click all boxes with traffic lights. Ok, well, this one box just barely contains the bottom right corner of the traffic light. Click. Nope, that little corner didn't count. Try again. Ok, well on this one, the right side of the traffic light is only barely over the line, so I won't click it. Nope, that sliver of the light mattered this time. MF!

6 comments

Heh, maybe one day they can show a bunch of pictures of sand, where each subsequent pic has a grain removed, with the instructions "click on all the heaps".

Spambots will solve the Sorites paradox!

"Click all the ships of Theseus"
"Is there no ship? Close the browser window."
click on each star that is currently visable out your window :-/
I actually have a few screenshots where the task was impossible since the data was mislabeled. The latest example was "click all of the buses". It wouldn't let me continue because I wouldn't select the fire truck.

My naive assumption is that you should click the "refresh" button in these cases.

Just click whatever you suspect is needed to pass. Don't go above and beyond trying to give the actual right answer; you're just feeding some proprietary database owned by Google. QA for it is their problem.
There is some alternate (or future) reality where a google self driving car accident is blamed on bad training data from CAPTCHAs.
> " It wouldn't let me continue because I wouldn't select the fire truck."

Another one is "click the mountains". It typically won't let you through unless you click anything with trees on the horizon, even if the terrain is clearly flat. Google's robot thinks mountains are made out of wood, and any human who disagrees is labeled a robot. It's insanity.

I've recently gotten caught in one of these, where it was "click all of the bicycles" and after a few clicks (it was one of those which fade out to present a new picture) the only "bicycle" left was a bicycle-shaped street decoration. It wouldn't let me proceed unless I clicked on something, so I had to refresh to get a new task.
I assumed the infuriating ambiguity is intentional, in order to train some algorithm they need to know what the prevailing human correct judgement is in dicey situations
I don't think it's intentional -- it probably just emerges from the training process.

I'm guessing they do something like load up a batch of images and once N people agree on one, record the answer and remove it from the rotation. You end up left with the ambiguous images where people couldn't agree.

Then why do I keep seeing the same g-damn FIRE HYDRANT! :)
>Those images are infuriating!

And, does the pole count?

The whole thing is way more stressful than it needs to be for what it is.

I'm convinced the ambiguity is intentional. What I don't get is what answer they expect in those scenarios.
I always figure they're looking for a population consensus. They're doing image recognition at scale and these are clearly ambiguous, hard images to classify. They could easily have a few people at Google say, "I determine this is a storefront" and make that the "correct" answer, but I think they're more interested in a consensus of what most "normal" people would classify as a storefront, especially in potentially-volatile classifications where real humans might argue over the answer. They can skip the argument and just know which side will win it.
What they're actually getting though is the population consensus of what normal people believes Google's image classifier believes. The system incentivizes users to reinforce misconceptions their classifier has.

Does this look like a mountain to you? https://0x0.st/zzvr.jpg

Google's image classifier would think that's a mountain. If you disagree, google will classify you as a robot. After failing these sort of challenges a few times the user decides to play along and tell google what they think google wants to hear, rather than the truth.

What makes you think Google's image classifer would think that's a mountain?

Especially if this is all used for learning, enough people saying "that is clearly not a mountain" would reinforce that it's, in fact, probably not a mountain. Even if I got classified as a robot, I'm not sure I would think "oh, a system designed to classify images would think this not-a-mountain is a mountain", so I definitely wouldn't double down and keep marking it as a mountain. I'd, well, not. And assume the system is at least as good as classifying the images it chooses to use as I am.

> "What makes you think Google's image classifer would think that's a mountain?"

Because every single time it asks me to classify mountains it rejects my answers if I don't click on trees on the horizon (and often trees on the horizon are the only "mountains" presented) and every single time it accepts the answer that such trees are mountains. I've gotten the mountains challenge dozens of times, the results are very consistent. If there is a group of trees on the horizon, that is asserted to be a mountain.

> "enough people saying "that is clearly not a mountain" would reinforce that it's, in fact, probably not a mountain."

Totally irrelevant because if I am trying to get through a google captcha, it's because that captcha is standing in the way of me doing something. My interest is in passing the captcha, not correcting Google's shitty image classifier. So I have absolutely no incentive to make my life harder by insisting on correct answers, and every incentive to tell Google what they want to hear.

>So I have absolutely no incentive to make my life harder by insisting on correct answers, and every incentive to tell Google what they want to hear.

I guess this is where the misunderstanding is. You don't think Google wants to hear the correct answer?

Trying to guess at what the daily/monthly flavor of "correct" is seems like it'd do more harm than good, resulting in some kind of nondeterministic guessing game of "well, trees on the horizon are probably assumed to be a mountain" that never settles on actually-correct answers (and, I'd wager, is often more inconvenient to the user than just answering correctly would be, because now there's a layer of indirection on what they think a system thinks of an image, rather than just what they think of that image).

If everyone just answered "no, that's trees" instead of a hand-wavy "I think you think it's a mountain", I feel like this captcha would be significantly easier for us humans (because we could actually give real answers), as well as less inconvenient for people who just want to pass on through and get on with whatever they were doing before a site wanted to verify they weren't a bot (because they can just, well, identify images instead of playing a game of "what does the machine think?").

It is just a consequence of other humans also having problems with these cases. They do not mind that you have to make multiple attempts, it is just more yummy data for their bots (their machine learning algorithms are trained on this stuff).
I'm pretty convinced they're not really using these for ML, but that their ML algorithms have already run on these and they already know these difficult (read: ambiguous) enough to make you give up. These cases specifically only come up when they seem to think you're probably a bot (based on cookies or IP or whatever). They seem to deliberately put the photo boundaries such that they slice through whatever object they want you to look for. And they intentionally make the delays extremely long. These don't happen when they think you're probably a human and just want to throw an extra hurdle (like if you're Googling a little too frequently from your usual browser/location).
this on so many levels.

Thankfully they'll eventually fall back to the "click the images of _object_ until there are no pictures left with a(n) _object_" in it, but those clicking block ones of a specific picture are super frustrating.