Hacker News new | ask | show | jobs
by musicale 38 days ago
Many of the problems with LLMs may be structural and intrinsic due to the way they work (probabilistic text generation) and their training data (often human-generated text that incorporates many features of human discourse that are undesirable in machine-generated output.)

The continual failures of "guardrails" show that it's incredibly difficult to get these systems to behave in reliable and predictable ways; unsupervised interactions with them are intrinsically unsafe, and should be treated as such.

Presumably Meta and others are trying to detect and prevent bad output and pathological interactions, but that detection is unlikely to be 100% accurate, and we've seen what the failure modes can look like.

1 comments

I'm not talking about the edge cases where it goes off the rails, I'm talking about the way it normally conversates. The way it was trained to do so through RL.
Oh, are you saying that incentivizing engagement (usually to increase ad views and revenue) also implicitly increases bad behaviors and outcomes, and that is why it won't be fixed? That sounds plausible. Even before LLMs the engagement/attention economy had strong negative effects that companies didn't want to address.