| HN Mirror

I'm saying the existence of the trope, within the training data, and the experimental setup, negate the breathless "Oh my god it did something unexpected in order to preserve itself!" as if an LLM has any sense of identity or self.

Many, many other bad things are in the training data. For an example of how this can manifest bad things that people don't seem to be discussing too much check out the recent Behind the Bastards episodes about how an AI Chatbot became a Cult Leader (The title is an exaggeration that the host explains while raising some excellent points about how LLMs have ingested a lot of cult leader material and can therefore mimic those speech patterns and impact people vulnerable to such things)