| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by musicale 99 days ago

> undesirable traits that derive from the training data

The research areas of model alignment and safety are attempting to address this fundamental problem - and have yet to solve it convincingly.

Problems like emergent misalignment can make things even worse.