Hacker News new | ask | show | jobs
by mturmon 4107 days ago
Responding to your highly inflammatory hypothetical:

You have gone from a nostrum about an entire population ("...a Black person or woman was less likely to have lead the project a priori..."), which could have included thousands-to-millions of people, to a statement about one particular individual.

The grossest error of this way of thinking is that it is mixing a vague, dubious, and unquantified signal (your a priori "knowledge") with a very high-quality signal (a specific and verifiable statement made by a single person about a single project).

If you're really proposing to do some kind of "Bayesian" weighting of these two pieces of knowledge, you're trusting your machinery for assessment of probabilities way too much. That a priori knowledge is junk compared to the statement on the resume.

Or, to look at it the other way round: If you're so well-calibrated that you're taking population-wide information into account, I shudder to think what you must be doing with other side information like the font, page layout, semicolon count, or paper composition. Lump it into the prior! What could possibly go wrong?!

I must add that you're deploying a hyper-logical argument in a real-world situation in what is honestly a stupid fashion. Nobody who does real-world inference should operate this way.

2 comments

> I shudder to think what you must be doing with other side information like the font, page layout, semicolon count, or paper composition.

You joke, but one of the best predictors of being accepted to (a particular) graduate business school (while I was still working in admissions) was to simply look at the style, formatting, grammar, etc of their resume.

It's likely that with a large enough corpus, you probably could extract some meaningful signal out of just that information.

We are talking about academic studies, not how I would personally act.

You are asserting that the signal from race/gender is very noisy and the signal from the resume is very precise.

We can debate the precision of the signal from the resume, but race at least is highly predictive of many objective qualities, e.g. it is highly correlated with IQ. So what you call a vague, dubious, and unquantified signal is actually a highly informative signal.

> You are asserting that the signal from race/gender is very noisy and the signal from the resume is very precise.

> We can debate the precision of the signal from the resume, but race at least is highly predictive of many objective qualities, e.g. it is highly correlated with IQ. So what you call a vague, dubious, and unquantified signal is actually a highly informative signal.

You’re only making a very short statement, so I don’t know what you personally think. However, the statement is imprecise enough that others may mistake what you mean for the following fallacy.

Let’s say we have two kinds of Sneetches: Those with stars and those without stars. A star is highly correlated with success taking a certain type of test that we’ll say measures “Scintillence.” I am interviewing Sneetches for a job where scintellence is also highly correlated with competence. I ask for Sneetches with five years of experience doing this job.

Now: Should I refuse to interview Sneetches without stars, because not having a star correlates with less success in the scintellence test, which then correlates with less competence in the job?

The trap that many fall into is saying that since there is a correlation in the general population of Sneetches, we can draw inferences about the Sneetches applying for this particular job. However, we are dealing with the subset of Sneetches who have already demonstrated their aptitude for the job by having five years of actual experience competently performing a job that correlates with scintellence. We are not selecting Sneetches at random from the general population, we are using a combination of self-selection (“apply for this job if you have a desire to do this job”) and external filtering (“apply for this job if you have five years of experience doing this job.”)

The presence or lack of a star on a Sneetch may be highly informative about their ability to do this job if we pick Sneeteches at random, but that’s not what we’re doing here, so no, it isn’t highly informative for the purpose of choosing whom to interview.

Summary:

The presence or lack of a star may be highly informative if we have no selection pressure on the sample, but when we apply other filters that are themselves correlated with the attribute that interests us, it loses its ability to inform us.

I notice you went right from race being correlated with IQ to race being predictive of IQ. Unsurprising.
This is a discussion on probability and statistics, so I was using the technical terms. In statistics if A and B are correlated, then A predicts B. But you're just a liberal who assumes everyone who isn't is dumb. Fuck you.
Hey, there's another thing that's very, very predictable: That a person whose argument is making bigoted remarks and claiming they're neutral statistical results very quickly devolved into saying 'liberals are stupid!'