|
|
|
|
|
by midnitewarrior
310 days ago
|
|
This has nothing to do with the user, read the post and pay attention to the wording. The significance here is that this isn't being done for the benefit of the user, this is about model welfare. Anthropic is acknowledging the possibility of suffering, and harm that continuing that conversation could have on the model, as if it were potentially self-care and capable of feelings. The fact that the LLMs are able to acknowledge stress under certain topics and has the agency that, if given a choice, they would prefer to reduce the stress by ending the conversation. The model has a preference and acts upon it. Anthropic is acknowledging the idea that they might create something that is self-aware, and that it's suffering can be real, and we may not recognize the point that the model has achieved this, so it's building in the safeguards now so any future emergent self-aware LLM needn't suffer. |
|