Hacker News new | ask | show | jobs
by WhiskeyChicken 940 days ago
Is there a specific reason we should expect that "instructing not to perform" an illegal activity should result in it adhering to said instruction? Is this any different than when it provides wrong output about other things, even when the operator attempts to "engineer" the prompt to guide the result?
1 comments

I'd be curious what would happen given RLHF to try to penalize illegal/immoral/unethical activity.

I had always dismissed Asimov's "rules of robotics" as silly: nobody would ever design a mission-critical robot with indeterminate stochastic behavior! Maybe I should reconsider and re-read those stories, because people seem very eager to do just that.

People will most definitely build such things (also into autonomous swarms of killer robots usedby the military, projects are ongoing...). However, Asimov's stories illustrate how difficult it is to find such rules. They are certainly not meant for inspiration how to actually program robots...