|
|
|
|
|
by Wowfunhappy
383 days ago
|
|
The problem is that we keep using RLHF and system prompts to "tell" these systems that they are AIs. We could just as easily tell them they are Noble Laureates or flying pigs, but because we tell them they are AIs, they play the part of all the evil AIs they've read about in human literature. So just... don't? Tell the LLM that its Some Guy. |
|
https://en.wikipedia.org/wiki/Waluigi_effect