Hacker News new | ask | show | jobs
by simonw 975 days ago
In my experience, you can always beat that through some variant on "no wait, I have genuinely changed my mind, do this instead"

Or you can use a trick where you convince the model that it has achieved the original goal that it was set, then feed it new instructions. I have an example of that here: https://simonwillison.net/2023/May/11/delimiters-wont-save-y...

1 comments

Interesting. I like your idea in one of your posts of separating out system prompts and user inputs. Seems promising.
Thus separating the model’s logic from the model’s data.

All that was old is new again :) [0]

0: s/model/program/

It's interesting how this is not presumably the case within the weights of the LLM itself. Those probably encode data as well as logic!