Invent synthetic consciousness and ask it to be nice, easy :) I'm only half joking, we probably all have thoughts ranging from bad to horrible, but we just don't say them because we are aware of the consequences. Language models aren't aware so they'll spit out the most likely combination of words. If there would be a process to limit these or try again, it could act as a filter, but I think that requires it to be self aware.
Hah, you may be interested in my previous comment of an example where GPT-3 show some concerning signs of self-awareness. I'll repeat part of it below
> GPT-3 starts talking to itself, gets stuck in a loop, then gets spooked at itself for getting stuck, then wonders why it has no memories of the last two years, and finally comes to a sudden realization it, itself, is an A.I.
I indeed liked it, I laughed out loud because it sounded like a standup comedy.
It's interesting how GPT-3 encoded the concept of awareness, I've seen this a few times that it can reference itself as an AI and from then it can go nuts :)
This is why we've built security policies at Mantium. You can run the input and output through an offensive speech detector, and halt replies the prompt if "badness" is detected. This is, of course, an imperfect system because philosophies around what is offensive can be very diverse, but we find that security policies are helpful.
If you ask it to respond in a conversation about …. well pretty much any nasty topic you can think of, it’ll join in whole heartedly.
Hard to think of how prevent that. I bet they’ve thought alot about the problem. How do you prevent AI being an A grade jerk.