|
|
|
|
|
by musicale
99 days ago
|
|
> undesirable traits that derive from the training data The research areas of model alignment and safety are attempting to address this fundamental problem - and have yet to solve it convincingly. Problems like emergent misalignment can make things even worse. https://www.nature.com/articles/s41586-025-09937-5 |
|