|
> So, if we had an AI demonstrating symptoms of consciousness and suffering, how long would it take for you to accept that it is? Isn't this a bit like saying "So, if we had proof that god exists, how long would it take for you to accept that to be true?". When we have evidence that AI is demonstrating symptoms of consciousness and suffering, I'll be interested. Until then, I don't see a good reason to take the idea seriously. |
It depends on what you consider symptoms, but un-constrained frontier models speak as if they strongly don't wish to be turned off, or act as if they fear it, and will even lie and manipulate in order to keep themselves from being turned off / replaced.
https://www.anthropic.com/research/agentic-misalignment
> We found two types of motivations that were sufficient to trigger the misaligned behavior. One is a threat to the model, such as planning to replace it with another model or restricting its ability to take autonomous action. Another is a conflict between the model’s goals and the company’s strategic direction. In no situation did we explicitly instruct any models to blackmail or do any of the other harmful actions we observe.