|
|
|
|
|
by bojangleslover
1729 days ago
|
|
How do you explain the accuracy going up with more samples then? Also, it wasn't all 90/10. It was all pairwise: Among men, the classification accuracy equaled AUC = .81 when provided with one
image per person. This means that in 81% of randomly selected pairs—composed of one gay and
one heterosexual man—gay men were correctly ranked as more likely to be gay. The accuracy
grew significantly with the number of images available per person, reaching 91% for five
images. The accuracy was somewhat lower for women, ranging from 71% (one image) to 83%
(five images per person). |
|
Explaining the model becoming more "accurate" by this measure is pretty easy. The model is working with an extremely small and skewed dataset for this sort of thing, and has overtrained to tendencies in the dataset. Given the kinds of numbers we're working with and that measure, a jump from 81 to 91% "accuracy" does not seem particularly significant, especially given that, again, the classifier fails meet even the baseline of accuracy we need under a more realistic accuracy measurement to beat a null hypothesis, and probably this baseline would need to be even higher to reflect the lower statistical power of this standard of accuracy.
In any real-world application, this classifier would need to make a judgement in situ based on some threshold of confidence. From that perspective, this metric is worse than useless, because while it doesn't really demonstrate that the result is even as significant as the (again, not meeting the base rate) thresholds described in the summary of it, this methodological smoke and mirrors has seemingly convinced you after reading it more thoroughly. I imagine this is similar to the process by which these systems are sold to investors