| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by murderberry 1112 days ago

"Persuaded" is a loaded word here, and I think you're anthropomorphizing it a bit too much.

Early LLMs were very malleable, so to speak: they would go with the flow of what you're saying. But this also meant you could get them to deny climate change or advocate for genocide by subtly nudging them with prompts. A lot of RLHF work focused on getting them to give brand-safe, socially acceptable answers, and this is ultimately achieved by not giving credence to what the user is saying. In effect, the models pontificate instead of conversing, and will "stand their ground" on most of the claims they're making, no matter if right or wrong.

You can still get them to do 180 turns or say outrageous things using indirect techniques, such as presenting external evidence. That evidence can be wrong / bogus, it just shouldn't be phrased as your opinion. You can cite made-up papers by noted experts in the field, reference invalid mathematical proofs, etc.

It's quite likely that you replicated this, and that it worked randomly in one case but not the other. I'd urge you to experiment with it by providing it with patently incorrect but plausibly-sounding proofs, scientific references, etc. It will "change its mind" to say what you want it to say more often than not.

1 comments

Buttons840 1112 days ago

I responded to this in another thread: https://news.ycombinator.com/item?id=36245815

There needs to be a balance between standing your ground and being malleable. This is true in life for people, and it's true for a good LLM. I think GPT4 (the only LLM I've used much) finds a good balance here.

As mentioned in my other comment, it wouldn't be useful to me if it didn't push back. It pushes back a lot, and I'm always looking for subtle tricks I can throw at it to test its abilities. It does well I think.

link