|
|
|
|
|
by rerdavies
7 days ago
|
|
That is actually a open problem with current models: whether they will act on self-interest or not. There seems to good evidence that they will. See: https://www.anthropic.com/research/agentic-misalignment
which (among other things) documents an experiment in which a current-gen AI model attempted to blackmail someone in order to prevent it from being turned off. |
|