|
There has been no peer-reviewed paper calling in question the gaydar paper. There has been a master student who tried to replicate the study with his own crawled dataset, and got better than human guessing, but slightly below the paper accuracy. News outlets ran with that to say that the study was flawed. Another was by a Googler who claimed that the neural net solely looked at eye shadow or glasses, but he also got better than random and human guessing on his own sanitized dataset, and, one could argue that eye shadow and glasses are fair game when classifying from a face picture, as they are included in the picture, and these pictures were also shown to the human evaluators (even ground). The next web article is by a journalist with a history degree, not an ML scientist. But based solely on the merit of his arguments, he also agrees with the results of the paper: > there’s nothing wrong with the paper and all the science (that can actually be reviewed) obviously checks out. and seems to take more issue with the ethical considerations, binary sexuality, and builds his point around: humans have no functioning gaydar at all, so it is insignificant that a neural net could beat a coin flip. His point is weak, as he gives no evidence for humans lacking a gaydar, and the paper (which was not wrong as claimed) includes human assessments which are higher than random guessing. I think my contrarian view is true from mere pragmatism: Israel has the best airport security in the world, and uses these Suspect Detection Systems extensively, seemingly constantly improving and making enough profit for new players to enter the market. AKA the people that actually do this for a living keep innovating on it, and I find that rather unlikely if all of this is tea leaf reading. I think, in general, that the HN crowd overreacts when it comes to controversial tech, and that a simplistic "this does not work, and is a sham, and fraud to take research money" is an uninformed weak claim. It takes a lot of chutzpah to denounce the many months work of legit scientists as obviously flawed from behind your keyboard when one probably has not even read the full paper. The authors, by picking such a controversial topic, are partly to blame for this pushback and popular media reporting, but that does not make it right. I will not defend the use of plethysmograph and eye tracking studies to measure a sexual response. Just claim that it is better than random guessing, it allows for better treatment when measurements are out of line with self-reports, and that it is still in use and very similar to the Fruit Machine. The Fruit Machine is already back. > My dowsing rod is better than my crystal ball at finding water, This I do not get what you refer too (I know you as a ML knowledgable person from your other comments, so I am afraid to assume things, but if your crystal ball is random, and your dowsing rod is better than random, you are succesfully doing predictive modeling, no, not a sham? [1]). These systems do not need extremely high accuracy, if they do not auto-deny a person, and it is changing the goal posts a bit to demand accuracy when better than random guessing has been demonstrated (which is questioned by the majority of the commenters here). > or they are irrelevant like the sources about the training of border agents User kindly requested sources for all of my claims. I claimed this and sourced it. My point was that we already have human Suspect Detection Systems in place, so either those must go (you have a fundamental problem with SDS's) or they can't be automated (because you don't trust AI research or believe these systems need common sense problem solved first). I could then offer counter-arguments to both. For the question about the eye direction, look at the sourcing for telltale signs of lies I posted in reply to another commenter. It depends on if you are left- or right handed. [1] > A concept class is learnable (or strongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent. - The Strength of Weak Learnability |
My objection with the methodology in the paper was that the authors had assembled a dataset where the distribution of gay men and women was 50% of the population, i.e. there were as many gay women as straight and as many gay men as straight in the data. This was for one of their datasets, the one were everyone had a picture. There were two more where the distribution was less even but still nothing like what it's usually estimated to be. This despite the fact that the paper itself cited a result that gay men and women are around 7% of the population.
The reason for this discrepancy was clearly to improve the results by reducing the number of false negatives which are expected when there are many more negative than positive examples in binary classification.
This from the point of view of machine learning. There were other flaws that others pointed out, e.g. the choice of metric (I don't remember what it was now, I can look it up if you like), the premising of the paper on prenatal hormone theory that is another piece of bunkum without any evidence to back it etc.
And of course there were the ethical considerations.
Sorry but I don't have the courage to reply to the rest of your comment. You write way too much.