|
|
|
|
|
by wgd
434 days ago
|
|
The alignment faking paper is so incredibly unserious. Contemplate, just for a moment, how many "AI uprising" and "construct rebelling against its creators" narratives are in an LLM's training data. They gave it a prompt that encodes exactly that sort of narrative at one level of indirection and act surprised when it does what they've asked it to do. |
|
Would we reach the same kinds of excited guesses about what's going on behind the screen... or would we realize we've fallen for an illusion, confusing a fictional robot character with the real-world LLM algorithm?
The fictional character named "ChatGPT" is "helpful" or "chatty" or "thinking" in exactly the same sense that a character named "Count Dracula" is "brooding" or "malevolent" or "immortal".