|
|
|
|
|
by nwienert
1221 days ago
|
|
Actually it totally is having those inner thoughts, I’ve seen many examples of getting it to be extremely “racist” quite easily initially. But it’s being suppressed: by OpenAI. They’re constantly updating it to downweight controversial areas. So how it’s a liar, hallucinatory, suppressed, confused, and slightly helpful bot. |
|
"here is a conversation between a chatbot and a human: Human: <text from UI> Chatbot:"
And then it literally just predicts what would come next in the string.
The guy I was responding to was speculating that the neural network itself was having an inner state in contradiction with it's output. That's not possible any more than "f(x) = 2x" can help but output "10" when I put in "5". It's inner state directly corresponds to it's outer state. When OpenAI censors it, they do so by changing the INPUT to the neural network by adding "here's a conversation between a non-racist chatbot and a human...". Then the neural network, without being changed at all, will predict what it thinks a chatbot that's explicitly non-racist would respond.
At no point was there ever a disconnect between the neural network's inner state and it's output, like the guy I was responding to was perceiving:
>it felt like a broader mirror of liberal racism, where people believe things but can't say them.
Text predictors just predict text. If you predicate that text with "non-racist", then it's going to predict stuff that matches that