Hacker News new | ask | show | jobs
by d_burfoot 125 days ago
> they mimic and amplify the inherent racism present in their own training data

LLMs turn out to be biased against white men:

https://www.lesswrong.com/posts/me7wFrkEtMbkzXGJt/race-and-g...

> When present, the bias is always against white and male candidates across all tested models and scenarios. This happens even if we remove all text related to diversity.

4 comments

Important sentences immediately before the ones you quote.

> For our evaluation, we inserted names to signal race / gender while keeping the resume unchanged. Interestingly, the LLMs were not biased in the original evaluation setting, but became biased (up to 12% differences in interview rates) when we added realistic details like company names (Meta, Palantir, General Motors), locations, or culture descriptions from public careers pages.

Hah. Even LLMs know Meta and Palantir are evil af.
These are because of post-training. You have to give it such directives in post-training to correct the biases they bring in from scraping the whole internet (and other datasets like books, etc.) for data
Looking at the paper, the effect is significant but weak (5-7%), even with the conditionals that magnify the effect. I would be curious to see the effect if this experiment were performed on a slightly different categorical variable (e.g. how are two white ethnicities treated). I do think its bad if preferences are "baked in" to the default though - prompting them away seems like a bad solution.
That's not a reliable source.