Hacker News new | ask | show | jobs
by musicale 99 days ago
> undesirable traits that derive from the training data

The research areas of model alignment and safety are attempting to address this fundamental problem - and have yet to solve it convincingly.

Problems like emergent misalignment can make things even worse.

https://www.nature.com/articles/s41586-025-09937-5