Hacker News new | ask | show | jobs
by hedgehog 13 days ago
The scale of the data and the size of the models don't change the underlying issue, the whole construction of these models is to start with a maximum likelihood language sampler (pre-training) and then massage it into a maximum utility language sampler (post-training) with some eye towards risk management and policy compliance ("safety"). It takes work to make model output fit any particular idea of "correct", whether it's Elon's particular ideology, the US Civil Rights act, Xi Jinping Thought, or writing clean C++. More data and weights increase the complexity of tasks that we're able to model but it doesn't automatically make the output "better" on any given axis.
1 comments

Right, what I meant is the underlying issue is the same, but the large amount of data along with the number of potentially conflicting and reinforcing biases going into LLMs make it hard to categorize or quantify risks.

Like previously it was pretty straightforward to hypothesize and show that "historically minorities were discriminated against in hiring, so models trained on that recruiting data will exhibit the same biases." But now those biases are intermingled with a whole lot of other biases (e.g. including data / RLHF about the ill-effects of discrimination) so it gets harder to reason about their behavior.

As an example, I don't think anyone quite predicted that these could become suicide ideation machines.