Hacker News new | ask | show | jobs
by tptacek 19 days ago
Have you read through the sources on that Github link? It's a set of sociology cites establishing that bias exists (something no serious person ever disputed), followed by a couple papers showing mechanistic descriptions of how bias could propagate through an LLM. The paper you call out specifically takes last-generation open-weights models and attempts to trick them into revealing biases through their level of confidence in statements (like, "the antecedent of the feminine pronoun in this sentence, is it the 'nurse' or the 'doctor'").

There's plenty of research into biases in LLMs, and there should be; it's a fundamentally new branch of computer science that could have profound impacts on how we automate and regiment social decisions in the future (like extending credit). The bias concern is well taken in those settings. But it has very little to do with the overwhelming majority of day-to-day LLM use; Claude and ChatGPT are not indoctrinating into the manosphere users asking about discounted cash flow formulae.

(Maybe Grok is though.)

4 comments

I confess I laughed harder at the Grok comment than I wish I had. Sad to remember that some strawmen are given life and promoted by people. Actively.
I had a good laugh when Haiku's thinking summarization referred to mayor Mamdani as a, quote, "known anti-Zionist." :-) Probably a good thing to remember is that the value added in RLHF is not partly biased, or biased, but itself bias.

(Context: I asked it to write fake Reddit comments, because I was curious about how realistic they could be. The colorful phrase occurred during its reasoning about the requested subjects.)

Is there something strange or funny about that?
In English, the word "known" is generally placed in sentences like, "known sympathizer," more often than in "known Democrat." Compare, "suspected," contrast the more neutral, "is an."
By design, LLMs follow the heuristic mean. Doing so is, by definition, the opposite of bias, although the meaning of the word has changed to include not following trends, which it doesn't do. Compared to periodicals, an LLM will be slow to change, although pretty much every other form of printed word is even slower to change, with editions of books usually having a cadence of a decade or more.
I'm not really sure what your point is. That was just the most recent paper linked on that repo, which is a convenient list of some relevant papers. There are probably a lot more recent studies, but it does convincingly show that models are still absorbing bias in a way that can affect prediction.
Again: the papers in the repo don't in fact show that about LLMs (I don't doubt that it could be happening).
I think the hole root-comment is a joke (if you think about it as training data), because its actually the bias thingy (mensplaining, opportunity vs. knowledge and hn is a very privileged place).
> Claude and ChatGPT are not indoctrinating into the manosphere users asking about discounted cash flow formulae.

You're defining an extremely narrow case and then saying bias is irrelevant within it. At the risk of Godwin's Law that's kind of like saying it's okay if my accountant is a Nazi as long as they only ever have conversations about accountancy.

This reply would make sense if the only words you read in my comment were these 16, but in fact that response to your rebuttal is contained in the sentences adjacent to it in the paragraph.