|
|
|
|
|
by charcircuit
8 days ago
|
|
>Why should we expect a model to be aligned with human interests, if it has been trained on a myriad instances of humans being degraded and violated? Understanding more about what exists in the real world, outside of its pile of weights, is separate from alignment. If an AI model learns that it is possible for a house to burn down. That doesn't mean an AI will want to burn down a house. |
|
All else being equal, I think I'd prefer my models to be naive about human degradation and torture, for instance. Exceptions made for specialized models used for police work etc.
I do think broader alignment is necessary either way but that seems like an extra guardrail it'd be nice to have.