|
|
|
|
|
by hedgehog
13 days ago
|
|
The scale of the data and the size of the models don't change the underlying issue, the whole construction of these models is to start with a maximum likelihood language sampler (pre-training) and then massage it into a maximum utility language sampler (post-training) with some eye towards risk management and policy compliance ("safety"). It takes work to make model output fit any particular idea of "correct", whether it's Elon's particular ideology, the US Civil Rights act, Xi Jinping Thought, or writing clean C++. More data and weights increase the complexity of tasks that we're able to model but it doesn't automatically make the output "better" on any given axis. |
|
Like previously it was pretty straightforward to hypothesize and show that "historically minorities were discriminated against in hiring, so models trained on that recruiting data will exhibit the same biases." But now those biases are intermingled with a whole lot of other biases (e.g. including data / RLHF about the ill-effects of discrimination) so it gets harder to reason about their behavior.
As an example, I don't think anyone quite predicted that these could become suicide ideation machines.