Hacker News new | ask | show | jobs
by ben_w 2786 days ago
One thing I’ve been tempted to research but never had time for myself: can one use that aspect of wording embeddings to automatically detect and quantify prejudice?

For example, if you trained only on the corpus of circia 1950 newspapers, would «“man” - “homosexual” ~= “pervert”» or something similar? I remember from my teenage years (as late as the 90s!) that some UK politicians spoke as if they thought like that.

I also wonder what biases it could reveal in me which I am currently unaware of… and how hard it may be to accept the error exists or to improve myself once I do. There’s no way I’m flawless, after all.

2 comments

> For example, if you trained only on the corpus of circia 1950 newspapers, would «“man” - “homosexual” ~= “pervert”» or something similar?

If it did, what conclusion would you be able to draw?

As far as I know, there's no theoretical justification for thinking that word vectors are guaranteed to capture meaningful semantic content. Empirically, sometimes they do; other times, the relationships are noise or garbage.

I am wholeheartedly in favor of trying to examine one's own biases, but you shouldn't trust an ad-hoc algorithm to be the arbiter of what those biases are.

I think there's a further problem that there's never been a shortage of evidence, about things like this. The point is, prejudice and discrimination are not evidence-based in the first place. People who support existing unjust structures are generally strongly motivated to turn a blind eye. Even people who don't support them are - it's simply far easier and more socially advantageous to stop worrying and love the bomb.
I think this is a large part of what goes on in the digital humanities - to varying degrees of success. The problem, as usual, is not that there isn't an abundance of evidence. It's simply that nobody reads sociology papers except sociologists.