| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jmacc93 1058 days ago

When I want to use behavioral directives, I've been putting the directives for ChatGPT in a parenthetical reminder header at the top of all of my messages to it

For example, you would say "Don't apologize", and then after that you would start every message with "(Remember to not apologize)"

I've also learned from using local LLMs that if you force the LM to start its response with something (via the 'start response with' field in text generation webui), then it will just go with that thing. This can be used to prevent RLHF-trained models from not responding because they think the question is unethical or illegal, and from giving the typical "I am a language model bla bla bla" responses. To be clear, if you put in the start response with field "Sure, I can answer that!", the LM will just go with it and not respond with "that's unethical" or "you're a horrible person for even conceiving of such a thing. they should lock you up for life!!", etc etc. It seems there is a similar effect when you edit the LM's past responses in some way, the LM's new responses will mimic that way of responding

Carrying that over to ChatGPT, it seems that if you request ChatGPT always starts its replies (via parenthetical reminder as well) with "Sure, I can answer that!", or with "I apologize", that that does seem to affect how it starts its replies. There appears to be some cases where it will say, for example "Sure, I can answer that! As an AI language model, ...", but it seems that forcing it to start its response a particular way helps prevent it from apologizing

But generally, for the apologizing thing, I just downvote it when the apology doesn't make any sense, and otherwise ignore the apologies, as others are doing. This indicates a potentially hazardous 2nd order effect where people are trained to ignore ChatGPT's apologies. eg: Boy who cried wolf, etc