|
|
|
|
|
by thatnerd
389 days ago
|
|
I think that's an invalid hypothesis here, not just an unlikely one, because that's not my understanding of how LLMs work. I believe you're suggesting (correctly) that a prediction algorithm trained on a data set where women outperform men with equal resumes would have a bias that would at least be valid when applied to its training data, and possibly (if it's representative data) for other data sets. That's correct for inference models, but not LLMs. An LLM is a "choose the next word" algorithm trained on (basically) the sum of everything humans have written (including Q&A text), with weights chosen to make it sound credible and personable to some group of decision makers. It's not trained to predict anything except the next word. Here's (I think) a more reasonable version of your hypothesis for how this bias could have come to be: If the weight-adjusted training data tended to mention male-coded names fewer times than female-coded names, that could cause the model to bring up the female-coded names in its responses more often. |
|
Imagine that you were given a very large corpus of reddit posts about some ridiculously complicated fantasy world, filled with very large numbers of proper names and complex magic systems and species and so forth. Your job is, given the first half of a reddit post, predict the second half. You are incentivized in such a way as to take this seriously, and you work on it eight hours a day for months or years.
You will eventually learn about this fantasy world and graduate from just sort of making blind guesses based on grammar and words you've seen before to saying, "Okay, I've seen enough to know that such-and-such proper name is a country, such-and-such is a person, that this person is not just 'mentioned alongside this country,' but that this person is an official of the country." Your knowledge may still be incomplete or have embarrassing wrong facts, but because your underlying brain architecture is capable of learning a world model, you will learn that world model, even if somewhat inefficiently.