There has been plenty of research that shows LLMs encode social biases. It seems pretty obvious even before looking at the research that training on the whole internet will end up encoding widely-held social biases and stereotypes.
Have you read through the sources on that Github link? It's a set of sociology cites establishing that bias exists (something no serious person ever disputed), followed by a couple papers showing mechanistic descriptions of how bias could propagate through an LLM. The paper you call out specifically takes last-generation open-weights models and attempts to trick them into revealing biases through their level of confidence in statements (like, "the antecedent of the feminine pronoun in this sentence, is it the 'nurse' or the 'doctor'").
There's plenty of research into biases in LLMs, and there should be; it's a fundamentally new branch of computer science that could have profound impacts on how we automate and regiment social decisions in the future (like extending credit). The bias concern is well taken in those settings. But it has very little to do with the overwhelming majority of day-to-day LLM use; Claude and ChatGPT are not indoctrinating into the manosphere users asking about discounted cash flow formulae.
By design, LLMs follow the heuristic mean. Doing so is, by definition, the opposite of bias, although the meaning of the word has changed to include not following trends, which it doesn't do. Compared to periodicals, an LLM will be slow to change, although pretty much every other form of printed word is even slower to change, with editions of books usually having a cadence of a decade or more.
I had a good laugh when Haiku's thinking summarization referred to mayor Mamdani as a, quote, "known anti-Zionist." :-) Probably a good thing to remember is that the value added in RLHF is not partly biased, or biased, but itself bias.
(Context: I asked it to write fake Reddit comments, because I was curious about how realistic they could be. The colorful phrase occurred during its reasoning about the requested subjects.)
In English, the word "known" is generally placed in sentences like, "known sympathizer," more often than in "known Democrat." Compare, "suspected," contrast the more neutral, "is an."
I'm not really sure what your point is. That was just the most recent paper linked on that repo, which is a convenient list of some relevant papers. There are probably a lot more recent studies, but it does convincingly show that models are still absorbing bias in a way that can affect prediction.
I think the hole root-comment is a joke (if you think about it as training data), because its actually the bias thingy (mensplaining, opportunity vs. knowledge and hn is a very privileged place).
> Claude and ChatGPT are not indoctrinating into the manosphere users asking about discounted cash flow formulae.
You're defining an extremely narrow case and then saying bias is irrelevant within it. At the risk of Godwin's Law that's kind of like saying it's okay if my accountant is a Nazi as long as they only ever have conversations about accountancy.
This reply would make sense if the only words you read in my comment were these 16, but in fact that response to your rebuttal is contained in the sentences adjacent to it in the paragraph.
Yes, it would be extremely bad if the statistical weight of the total corpus of training data caused a system using an LLM to make decisions about extending credit to offer worse terms (say) to women.
> sing an LLM to make decisions about extending credit to offer worse terms (say) to women.
In general, or if it isn't the correct answer?
Like: young men pay more for car insurance than young women (today). This is based on statistical models. Should they be outlawed? I think that is a very interesting question (but they aren't, today).
If the LLM was in charge, would it be wrong for it to charge young men more? Should we train that "bias" out? Or should we only train out biases that are wrong? And would that be different than how we train them today?
I don't know the answer. But I think it is less obvious than some people seem to think.
young men pay more for car insurance than young women (today). This is based on statistical models. Should they be outlawed?
EU has outlawed them. their argument is that differentiation is only valid if the difference is the actual cause and not merely statistical correlation.
Ironically, in the US it is ok to charge men more for car insurance, since they cost more in aggregate. It is illegal to charge women more for health insurance even though they cost more in aggregate.
It would obviously be very bad if those decisions were being made based on the statistical weight of the training corpus of a general large language model.
That just shows how biased you yourself are. Every human is. It is FAR more likely that the algorithm would give better credit terms to women and worse terms to men, as it is already the case with insurance. Yet you assume the opposite because of your personal biases.
At least LLMs offer a way to be tuned against that. Not that their creators would be interested in that, since the LLM's bias is exactly the mainstream opinion that they like very much.
I wasn’t assuming anything. I was asking whether the problem was bias — which we already see in some things that are highly regulated — or just wrong bias.
I’m trying to understand what people think we should correct for.
Correct. They will never not have a social bias. Which leads to the question of, who controls these tools, and what biases are they okay/not okay with specifically training for. Currently they can be seen more as a reflection of broader culture (and even that has problems) but as we're already seeing with Grok they can be tuned at a whim to display any specific ideologies.
Those are some of the questions it leads to, but there are other questions that situate agency outside of the labs and in the hands of users, like, what processes do you have set up to backstop automated decisionmaking?
It's not interesting to observe that Grok was successfully trained to be an edgelord; anybody paying attention knew that was easily achievable.
> what processes do you have set up to backstop automated decisionmaking?
The companies releasing these models actively encourage the act of automated decision making by them. The entire value proposition is the automation of decisions and knowledge work. It's rare to find a use case for them that isn't offboarding your thinking and therefore agency
The entire value proposition of the computer industry is the automation of decisions and knowledge work. We are and always have been in the business of automating away people's jobs.
I reckon we agree more than we disagree, but there is a dichotomy of expansive and contractive technologies. Much of the computer industry has given more agency, choice, and knowledge to people.
The bias concerns in Gebru's paper cover pre-LLM systems. For all we know, modern frontier models might mitigate many of the concerns the paper brings up. It's hard to know. The logic used in summaries like the one we're commenting on is conclusory: centuries of prejudice are encoded in the total corpus of human language, language models are trained on that corpus, ergo language models must be biased.
It's incredibly depressing that the concept of "bias" has been shrunken down to solely mean "bad attitudes about an ethnic or gender ground" (and perhaps on the right, "bad attitudes about conservatives")
Bias could mean so, so many other things. Was the amyloid hypothesis incorrect? How should we use semicolons? How do you know when meetings waste more time than not? etc. People understand the world via mental shortcuts, via theory-rather-than-fact. We're stuck doing this because we're limited in so many ways. We are so biased about so many things, and this could interact in so many interesting ways. But damned if anyone cares about that. The only thing they seem to care about is how you feel about the "right" or "wrong" groups of people. It's a catastrophic waste of time and energy.
It's incredibly depressing that you believe arguing about semicolons is more important than argument about human beings, power hierarchies, prejudice and the way these are encoded and expressed by the systems we create and use to influence and control society, but I guess it takes all kinds.
In general, people who complain about power hierarchies do not want an end to hierarchies. They just want the hierarchies to be reshuffled so that they are the ones on top. There are exceptions, there are certainly true believers, but for the most part it's just another tired power grab by another name.
So to be clear, you believe that Timnit Gebru doesn't actually believe anything she claims, that she just wants power? Just for herself? For women? For black people? Are all black people and women involved in this conspiracy of lies? All leftists? Only black women who criticize the systemic bias in AI?
Help me - clearly you understand the truth of the matter far more than those of us who are apparently wasting our time discussing the matter rather than blithely dismissing it. How exactly can you tell that she's a liar who doesn't actually want to end hierarchies? Help us to be as discerning as you are.
its incredibly depressing ostensibly intelligent people get depressed about others having different points of view or set up fallacies of the excluded middle / xor fallacies where not warranted.
They aren't expressing a point of view, they're engaging in lazy performative cynicism. It's incredibly depressing so few people here can tell the difference.
There's plenty of research into biases in LLMs, and there should be; it's a fundamentally new branch of computer science that could have profound impacts on how we automate and regiment social decisions in the future (like extending credit). The bias concern is well taken in those settings. But it has very little to do with the overwhelming majority of day-to-day LLM use; Claude and ChatGPT are not indoctrinating into the manosphere users asking about discounted cash flow formulae.
(Maybe Grok is though.)