Hacker News new | ask | show | jobs
by phh 1120 days ago
> If I say "it's too chilly in here" in a house with only lights, it will turn them on as a way of "warming things up".

Thanks for the example that's interesting.

FWIW, this is pretty much what has been described as "waluigi" effect a bit extended: in a text you'll find on the internet, if some information at the beginning is mentioned, it WILL be relevant somewhere at some point later in that text. So an auto-completion algorithm will use all the information that has been given in the prompt. In your example it puts it in an even weirder situation where the model the overall model information (the lights, and that you're cold and nothing else), and it must generate a response. It would be a fun psychological study to look at, but I'm pretty sure even humans would do that in that situation (assuming they realize that lights may indeed produce a little bit of wattage of heat)

2 comments

> FWIW, this is pretty much what has been described as "waluigi" effect a bit extended

Sorry I disagree for some reasons. First, turning the lights on is literally the only thing the bot can do to heat up the house at all. Turning on the lights does heat it up a little bit. So it's the right answer. Second, that's not the Waluigi effect, not even 'pretty much' and not even 'a bit extended'. Both of them are talking about things LLMs say, but other than that no.

The Waluigi effect applied to this scenario might be like, you tell the bot to make the house comfortable, and describe all the ways that a comfortable house is like. Then by doing this you have also implicitly told the bot how to make the most uncomfortable house possible. Its behavior is only one comfortable/uncomfortable flip away from creating a living hell. Say that in the course of its duties the bot is for some reason unable to make the house as comfortable as it would like to be able to do. It might decide that it didn't do it, because it's actually trying to make the house uncomfortable instead of comfortable. So now you got a bot turning your house into some haunted house beetlejuice nightmare.

For performant enough models, you can just instruct it not to necessarily use that information in immediate completions.

adding something like

"Write the first page of the first chapter of this novel. Do not introduce the elements of the synopsis too quickly. Weave in the world, characters, and plot naturally. Pace it out properly. That means that several elements of the story may not come into light for several chapters."

after you've written up key elements you want in the story actually makes the models write something that paces ok/normally.