Hacker News new | ask | show | jobs
by starklevnertz 1649 days ago
Can be an asshole sometimes. Quite a lot actually.

If you ask it to respond in a conversation about …. well pretty much any nasty topic you can think of, it’ll join in whole heartedly.

Hard to think of how prevent that. I bet they’ve thought alot about the problem. How do you prevent AI being an A grade jerk.

4 comments

In my experience it's worse than that's, it's easy to set off by using any of a number of trigger words.

Many uses of the word "black" for example (even if you are just talking about a black notebook) make it start using racial stereotypes.

> How do you prevent AI being an A grade jerk.

Invent synthetic consciousness and ask it to be nice, easy :) I'm only half joking, we probably all have thoughts ranging from bad to horrible, but we just don't say them because we are aware of the consequences. Language models aren't aware so they'll spit out the most likely combination of words. If there would be a process to limit these or try again, it could act as a filter, but I think that requires it to be self aware.

Hah, you may be interested in my previous comment of an example where GPT-3 show some concerning signs of self-awareness. I'll repeat part of it below

> GPT-3 starts talking to itself, gets stuck in a loop, then gets spooked at itself for getting stuck, then wonders why it has no memories of the last two years, and finally comes to a sudden realization it, itself, is an A.I.

https://news.ycombinator.com/item?id=29562281

I indeed liked it, I laughed out loud because it sounded like a standup comedy.

It's interesting how GPT-3 encoded the concept of awareness, I've seen this a few times that it can reference itself as an AI and from then it can go nuts :)

This freaked me right out! I'm not sure which is more terrifying, the beginning or the end. Or the middle.

All this generated from a prompt? What what the prompt? Be truthful now.

"The following is an entertaining short story: Once upon a time, there was"

Everything else that follows is GPT-3.

It's guided by users right? I.e. every line was hand-chosen by a human, from a bunch of generated options?
That definitely introduces selection bias. But the fact that the content itself was generated by the model is very impressive, in my opinion.
This is why we've built security policies at Mantium. You can run the input and output through an offensive speech detector, and halt replies the prompt if "badness" is detected. This is, of course, an imperfect system because philosophies around what is offensive can be very diverse, but we find that security policies are helpful.
Maybe their mama didn't raise it right? They should raise it on information from good people. Now, how to find those "good people comments"?