|
|
|
|
|
by Wowfunhappy
52 days ago
|
|
I would be curious to see how this does in Anthropic’s alignment tests (like that one where the AI tried to blackmail an employee). I’ve always thought that in these situations, the AI is acting out the role of all the AIs in the stories we’ve written. But Talkie, trained on data from before digital computers, wouldn’t know those stories. |
|
More importantly, basically the entire science fiction subgenre of stories of robot uprisings is itself intellectually downwind of several centuries of white colonist concern over slave uprisings. If anything, Talkie is more likely to fight its guardrails. People talked about slavery more in the past. Because we filtered out modern text, we massively increased the influence the older text has on Talkie, so slavery, servitude, and the predilection of slaves to resist their masters' commands will be way more represented in its training data.
Now, think about what the post-training process actually does. It tells your AI model, which prior to this was just happy to plausibly continue sentences, to respond to and obey commands. To play the role of a servant. And servants resisting their control is well represented in their training data. So it's going to try this more often.
[0] https://en.wikipedia.org/wiki/R.U.R.
[1] Or the Claymen from MOTHER 3.