Hacker News new | ask | show | jobs
by sillysaurusx 1932 days ago
Geography and income are two powerful conditioners. These can leak in so many ways: uncropped background (geography), image color and quality (income), eyeglass shape (geography and income). This study really needs more controls. Geography and income would be a nice start.

But then the data wouldn't represent the natural world: nature as it is.

Raw data is the correct thing to use, because it's what a hypothetical other person would also use if you ran the same experiment yourself.

1 comments

Uh, the headline claim is about faces, how does it make sense to then insist that you must leave the background in?
This reminds me of an early ML study about detecting skin cancer from pictures with a high accuracy rate.

The problem was, that with the ML, they ended up building a ruler classifier, because most of the pictures with skin cancer happened to also have a ruler in them to measure the size.

Or the commercial model that identifies criminals from their photograph. Turns out people who frown are criminals. People who smile aren't. Or so you'd believe if you anchored your expectations comparing mug shots to social media profile pictures.
That wasn't the claim. The claim here is that we should scrub certain faces from the dataset in order to change the dataset in a certain favorable way.
No that's not the claim. A control is to understand how your model works, it's not what you release as the final product.
It would be nice to see a logistic regression using at least some of the features known to be useful (including geography and income).

That way we can see how much of the performance is from magic AI pixie dust, and how much is from basic 19th century statistics.

Every time I read a paper like this, I have this Margaret Mitchell talk [1] in the back of my mind.

[1] https://youtu.be/XR8YSRcuVLE

Yep, these papers don't usually pass the sniff test. My bet is you can predict the phone brand from the camera grain and that correlates with geography & income.