Hacker News new | ask | show | jobs
by dinfinity 295 days ago
I concur. Something to keep in mind is that it is often more robust to pull an LLM towards the right place than to push it away from the wrong place (or more specifically, the active parts of its latent space). Sidenote: also kind of true for humans.

That means that positively worded instructions ("do x") work better than negative ones ("don't do y"). The more concepts that you don't want it to use / consider show up in the context, the more they do still tend to pull the response towards them even with explicit negation/'avoid' instructions.

I think this is why clearing all the crap from the context save for perhaps a summarizing negative instruction does help a lot.

2 comments

  >  positively worded instructions ("do x") work better than negative ones ("don't do y")
I've noticed this.

I saw someone on Twitter put it eloquently: something about how, just like little kids, the moment you say "DON'T DO XYZ" all they can think about is "XYZ..."

> That means that positively worded instructions ("do x") work better than negative ones ("don't do y").

In teacher school, we're told to always give kids affirmative instructions, ie "walk" instead of "don't run". The idea is that it takes more energy for a child to figure out what to do.