I assumed the infuriating ambiguity is intentional, in order to train some algorithm they need to know what the prevailing human correct judgement is in dicey situations
I don't think it's intentional -- it probably just emerges from the training process.
I'm guessing they do something like load up a batch of images and once N people agree on one, record the answer and remove it from the rotation. You end up left with the ambiguous images where people couldn't agree.
I'm guessing they do something like load up a batch of images and once N people agree on one, record the answer and remove it from the rotation. You end up left with the ambiguous images where people couldn't agree.