Hacker News new | ask | show | jobs
by dankai 487 days ago
I mean it would be enough to tell it to "Not cheat" or "Don't engage in unethical behaviour" or "Play by the rules". I think LLMs understand very well what you mean with these broad categories.
2 comments

Very specific rules that minimize the use of negations is more applicable. This is also kind of why chain of thought in LLMs can be useful, in that you can more explicitly see the steps and take note when negation demands aren't being as helpful as you would think.

Not just negation demands, but also generally other tricks we use for thinking and communication shorthands. "Unethical behavior" here for example, we know what that means since the context is clear, but to LLMs that context can be unclear in which the unethical behavior can mean well... anything.

Thou shall not Cheat Thou shall not Defraud Thou shall not Deceive Thou shall not Trick Thou shall not Swindle Thou shall not Scam Thou shall not Con Thou shall not Dupe Thou shall not Hoodwink Thou shall not Mislead Thou shall not Bamboozle Thou shall not ...