| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by courgette 1193 days ago
	I try really hard to prompt it to replace it with something else. It acknowledges and agree to it. Did it maybe once or twice, then reverted back to the old “As a AI model” IIRC, I was trying to see if it could replace by “as a LLM”.

3 comments

IIAOPSW 1193 days ago

My attempt was to substitute it with "speaking as a mother".

link

corobo 1193 days ago

I realise you want to do it using the prompt but wouldn't it be easier to `output.replace("As an AI language model, ", "As a totes sentient robot, ")`?

link

courgette 1191 days ago

I was just playing with it, checking if I could give him instructions that would span several prompts and what not.

I try to make it play a games with me and start the prompt differently until a specific keyword was entered… it kinda worked. Kinda being key.

link

titaniczero 1193 days ago

It’s probably provided as system instructions for rejecting things. You can use the API and feed it with different instructions with the system role

link

Uehreka 1192 days ago

I kind of wonder if maybe they look for certain words in the output (or run it through some sort of sentiment analysis) and if it fails they submit the prompt again with a very strongly worded system prompt (after your prompt) instructing it to reject the command and begin with the phrase “As an AI language model”.

Like, I haven’t heard about a way they could actually implement filters this powerful “inside” the model, it feels like it’s probably a less elegant system than we’d imagine.

link

circuit10 1192 days ago

They use RLHF (reinforcement learning through human feedback) which means they can reward it when it does it and punish it when it doesn’t

They’ve probably done it strongly enough that it can’t really not do it, maybe on purpose to prevent misuse

link