Hacker News new | ask | show | jobs
by quasirandom 1927 days ago
When I read a paper like this I'm looking for four things: (1) the data, (2) the benchmarks, (3) the architecture, (4) the controls/ablation.

1. The data:

"We used a sample of 1,085,795 participants from three countries (the U.S., the UK, and Canada; see Table 1) and their self-reported political orientation, age, and gender. Their facial images (one per person) were obtained from their profiles on Facebook or a popular dating website... Facial images were processed using Face++37 to detect faces. Images were cropped around the face-box provided by Face++ (red frame on Fig. 1) and resized to 224 × 224 pixels."

2. The benchmarks:

"For example, when asked to distinguish between two faces—one conservative and one liberal—people are correct about 55% of the time."

3. The controls:

"What would an algorithm’s accuracy be when distinguishing between faces of people of the same age, gender, and ethnicity? To answer this question, classification accuracies were recomputed using only face pairs of the same age, gender, and ethnicity."

A. A complaint:

Geography and income are two powerful conditioners. These can leak in so many ways: uncropped background (geography), image color and quality (income), eyeglass shape (geography and income). This study really needs more controls. Geography and income would be a nice start.

2 comments

What stood out to me was

> Their facial images (one per person) were obtained from their profiles on Facebook or a popular dating website

so of course the first thing to comes to mind is "how good of a predictor is just knowing which of those two sites the image came from?"

Geography and income are two powerful conditioners. These can leak in so many ways: uncropped background (geography), image color and quality (income), eyeglass shape (geography and income). This study really needs more controls. Geography and income would be a nice start.

But then the data wouldn't represent the natural world: nature as it is.

Raw data is the correct thing to use, because it's what a hypothetical other person would also use if you ran the same experiment yourself.

Uh, the headline claim is about faces, how does it make sense to then insist that you must leave the background in?
This reminds me of an early ML study about detecting skin cancer from pictures with a high accuracy rate.

The problem was, that with the ML, they ended up building a ruler classifier, because most of the pictures with skin cancer happened to also have a ruler in them to measure the size.

Or the commercial model that identifies criminals from their photograph. Turns out people who frown are criminals. People who smile aren't. Or so you'd believe if you anchored your expectations comparing mug shots to social media profile pictures.
That wasn't the claim. The claim here is that we should scrub certain faces from the dataset in order to change the dataset in a certain favorable way.
No that's not the claim. A control is to understand how your model works, it's not what you release as the final product.
It would be nice to see a logistic regression using at least some of the features known to be useful (including geography and income).

That way we can see how much of the performance is from magic AI pixie dust, and how much is from basic 19th century statistics.

Every time I read a paper like this, I have this Margaret Mitchell talk [1] in the back of my mind.

[1] https://youtu.be/XR8YSRcuVLE

Yep, these papers don't usually pass the sniff test. My bet is you can predict the phone brand from the camera grain and that correlates with geography & income.