| HN Mirror

Right, what I meant is the underlying issue is the same, but the large amount of data along with the number of potentially conflicting and reinforcing biases going into LLMs make it hard to categorize or quantify risks.

Like previously it was pretty straightforward to hypothesize and show that "historically minorities were discriminated against in hiring, so models trained on that recruiting data will exhibit the same biases." But now those biases are intermingled with a whole lot of other biases (e.g. including data / RLHF about the ill-effects of discrimination) so it gets harder to reason about their behavior.

As an example, I don't think anyone quite predicted that these could become suicide ideation machines.