|
|
|
|
|
by phh
1120 days ago
|
|
> If I say "it's too chilly in here" in a house with only lights, it will turn them on as a way of "warming things up". Thanks for the example that's interesting. FWIW, this is pretty much what has been described as "waluigi" effect a bit extended: in a text you'll find on the internet, if some information at the beginning is mentioned, it WILL be relevant somewhere at some point later in that text. So an auto-completion algorithm will use all the information that has been given in the prompt. In your example it puts it in an even weirder situation where the model the overall model information (the lights, and that you're cold and nothing else), and it must generate a response. It would be a fun psychological study to look at, but I'm pretty sure even humans would do that in that situation (assuming they realize that lights may indeed produce a little bit of wattage of heat) |
|
Sorry I disagree for some reasons. First, turning the lights on is literally the only thing the bot can do to heat up the house at all. Turning on the lights does heat it up a little bit. So it's the right answer. Second, that's not the Waluigi effect, not even 'pretty much' and not even 'a bit extended'. Both of them are talking about things LLMs say, but other than that no.
The Waluigi effect applied to this scenario might be like, you tell the bot to make the house comfortable, and describe all the ways that a comfortable house is like. Then by doing this you have also implicitly told the bot how to make the most uncomfortable house possible. Its behavior is only one comfortable/uncomfortable flip away from creating a living hell. Say that in the course of its duties the bot is for some reason unable to make the house as comfortable as it would like to be able to do. It might decide that it didn't do it, because it's actually trying to make the house uncomfortable instead of comfortable. So now you got a bot turning your house into some haunted house beetlejuice nightmare.