| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by quasirandom 1973 days ago

When I read a paper like this I'm looking for four things: (1) the data, (2) the benchmarks, (3) the architecture, (4) the controls/ablation.

1. The data:

"We used a sample of 1,085,795 participants from three countries (the U.S., the UK, and Canada; see Table 1) and their self-reported political orientation, age, and gender. Their facial images (one per person) were obtained from their profiles on Facebook or a popular dating website... Facial images were processed using Face++37 to detect faces. Images were cropped around the face-box provided by Face++ (red frame on Fig. 1) and resized to 224 × 224 pixels."

2. The benchmarks:

"For example, when asked to distinguish between two faces—one conservative and one liberal—people are correct about 55% of the time."

3. The controls:

"What would an algorithm’s accuracy be when distinguishing between faces of people of the same age, gender, and ethnicity? To answer this question, classification accuracies were recomputed using only face pairs of the same age, gender, and ethnicity."

A. A complaint:

Geography and income are two powerful conditioners. These can leak in so many ways: uncropped background (geography), image color and quality (income), eyeglass shape (geography and income). This study really needs more controls. Geography and income would be a nice start.

2 comments

JoshuaDavid 1973 days ago

What stood out to me was

> Their facial images (one per person) were obtained from their profiles on Facebook or a popular dating website

so of course the first thing to comes to mind is "how good of a predictor is just knowing which of those two sites the image came from?"

link

sillysaurusx 1973 days ago

But then the data wouldn't represent the natural world: nature as it is.

Raw data is the correct thing to use, because it's what a hypothetical other person would also use if you ran the same experiment yourself.

link

SiempreViernes 1973 days ago

Uh, the headline claim is about faces, how does it make sense to then insist that you must leave the background in?

link

marklubi 1973 days ago

This reminds me of an early ML study about detecting skin cancer from pictures with a high accuracy rate.

The problem was, that with the ML, they ended up building a ruler classifier, because most of the pictures with skin cancer happened to also have a ruler in them to measure the size.

link

quasirandom 1973 days ago

Or the commercial model that identifies criminals from their photograph. Turns out people who frown are criminals. People who smile aren't. Or so you'd believe if you anchored your expectations comparing mug shots to social media profile pictures.

link

sillysaurusx 1973 days ago

That wasn't the claim. The claim here is that we should scrub certain faces from the dataset in order to change the dataset in a certain favorable way.

link

ad404b8a372f2b9 1973 days ago

No that's not the claim. A control is to understand how your model works, it's not what you release as the final product.

link

quasirandom 1973 days ago

It would be nice to see a logistic regression using at least some of the features known to be useful (including geography and income).

That way we can see how much of the performance is from magic AI pixie dust, and how much is from basic 19th century statistics.

Every time I read a paper like this, I have this Margaret Mitchell talk [1] in the back of my mind.

[1] https://youtu.be/XR8YSRcuVLE

link

ad404b8a372f2b9 1973 days ago

Yep, these papers don't usually pass the sniff test. My bet is you can predict the phone brand from the camera grain and that correlates with geography & income.

link