|
|
|
|
|
by atleastoptimal
332 days ago
|
|
We can't rely on hoping that AI models never see bad ideas or are exposed to harmful content for them to be safe. That's a very flimsy alignment plan and is far more precarious than designing models which understand and are aware of bad content and nevertheless aren't affected in a negative direction. |
|